电信科学 ›› 2023, Vol. 39 ›› Issue (8): 136-148.doi: 10.11959/j.issn.1000-0801.2023165

• 专栏:算力网络 • 上一篇    

基于策略约束强化学习的算网多目标优化研究

沈林江1, 曹畅2, 崔超1, 张岩2   

  1. 1 浪潮通信信息系统有限公司,山东 济南 250100
    2 中国联合网络通信有限公司研究院,北京 100048
  • 修回日期:2023-08-06 出版日期:2023-08-01 发布日期:2023-08-01
  • 作者简介:沈林江(1981- ),男,浪潮通信信息系统有限公司副总经理、算力网络研究院院长,主要从事算力网络相关前沿理论分析、技术研究和产品设计等工作
    曹畅(1984- ),男,博士,中国联合网络通信有限公司研究院未来网络研究部总监、高级工程师,主要从事算力网络、IPv6+网络新技术、未来网络体系架构等研究工作
    崔超(1993- ),男,现就职于浪潮通信信息系统有限公司,主要从事算力网络、AI算法等相关研究工作
    张岩(1983- ),男,博士,中国联合网络通信有限公司研究院未来网络研究部主任研究员、高级工程师,主要从事算力网络、云网融合/云计算、未来网络体系架构等研究工作

Research on constrained policy reinforcement learning based multi-objective optimization of computing power network

Linjiang SHEN1, Chang CAO2, Chao CUI1, Yan ZHANG2   

  1. 1 Inspur Communication Information System Co., Ltd., Jinan 250100, China
    2 Research Institute of China United Network Communications Co., Ltd., Beijing 100048, China
  • Revised:2023-08-06 Online:2023-08-01 Published:2023-08-01

摘要:

算力网络需要在满足用户业务需求的基础上最大化系统性能指标,现有方法主要通过多目标加权进行转换和求解,存在超参数难以确定、跨场景适用性差等问题。在分析算网目标特性的基础上,基于策略约束强化学习,将业务需求作为约束、系统性能指标作为优化目标,通过价值—策略—超参数的多级迭代策略,实现算网对用户业务需求的期望确定性保障和对系统性能的最优化。同时,研究了针对超参数寻优的多尺度步长(multi-scale step length,MSL)方法,进一步提升了系统的稳定性和准确性。仿真结果表明,所提方法在系统架构和负载变化情况下均具有良好的收敛性和稳定性。

关键词: 算力网络, 多目标优化, 强化学习

Abstract:

The computing power network needs to maximize the system performance index on the basis of meeting user business needs, and the existing methods are mainly based on the multi-objective weighting method, which has problems such as difficult to determine hyperparameters and poor cross-scenario applicability.Based on this, based on the analysis of the characteristics of the computing power network target, the user business requirements were taken as the policy constraints, and the performance indicators of the computing power network was taken as the optimization goal based on constrained policy optimization, and the expectation certainty of user business needs and the optimization of system performance through the value-strategy-hyper-parameter multi-level iterative strategy was realized.At the same time, the multi-scale step length (MSL) method for hyper-parameter optimization was studied, which further improved the stability and accuracy of the system.Simulation results show that the proposed method has good convergence and stability under the conditions of single terminal-single edge server, multi-terminal-multi-edge server and system load change.

Key words: computing power network, multi-objective optimization, reinforcement learning

中图分类号: 

No Suggested Reading articles found!