通信学报 ›› 2020, Vol. 41 ›› Issue (6): 51-60.doi: 10.11959/j.issn.1000-436x.2020117

• 学术论文 • 上一篇    下一篇

基于深度增强学习和多目标优化改进的卫星资源分配算法

张沛1,2,刘帅军3,马治国2(),王晓晖1,宋俊德1   

  1. 1 北京邮电大学计算机学院,北京 100876
    2 中国信息通信研究院,北京 100191
    3 中国科学院软件研究所,北京 100190
  • 修回日期:2020-05-20 出版日期:2020-06-25 发布日期:2020-07-04
  • 作者简介:张沛(1986- ),女,河南三门峡人,北京邮电大学博士生,主要研究方向为卫星通信、深度增强学习、神经网络等|刘帅军(1988- ),男,河北邢台人,博士,中国科学院软件研究所助理研究员,主要研究方向为低轨星座网络、卫星5G 融合、动态资源管理|马治国(1978- ),男,北京人,中国信息通信研究院高级工程师,主要研究方向为5G通信、卫星通信等|王晓晖(1972- ),男,浙江建德人,北京邮电大学讲师,主要研究方向为5G通信、卫星通信等|宋俊德(1938- ),男,河北沧州人,博士,北京邮电大学教授,主要研究方向为智慧城市、5G通信、卫星通信等
  • 基金资助:
    国家重点研发计划基金资助项目(2018YFB0105105);国家科技重大专项基金资助项目(2018ZX03001016)

Improved satellite resource allocation algorithm based on DRL and MOP

Pei ZHANG1,2,Shuaijun LIU3,Zhiguo MA2(),Xiaohui WANG1,Junde SONG1   

  1. 1 School of Computer Science,Beijing University of Posts and Telecommunications,Beijing 100876,China
    2 China Academy of Information and Communications Technology,Beijing 100191,China
    3 Institute of Software,Chinese Academy of Sciences,Beijing 100190,China
  • Revised:2020-05-20 Online:2020-06-25 Published:2020-07-04
  • Supported by:
    The National Key Research and Development Program of China(2018YFB0105105);The National Science and Technology Major Project of China(2018ZX03001016)

摘要:

针对多波束卫星系统中资源分配序列决策的多目标优化(MOP)问题,为了在提升卫星系统性能的同时,提高用户业务需求的满意度,提出了一种基于深度增强学习(DRL)的DRL-MOP 算法。所提算法基于DRL和MOP 技术,对动态变化的系统环境和用户到达模型建模,以归一化处理后的频谱效率、能量效率和业务满意度指数的加权和作为优化目标,实现了系统和用户累计性能的优化。仿真对比表明,所提算法可以更好地解决面向多波束卫星系统的多目标优化问题,系统性能和用户满意度优化结果较好,且收敛快、复杂度低。

关键词: 多波束卫星系统, 资源分配, 序列决策, 深度增强学习, 多目标优化

Abstract:

In view of the multi-objective optimization (MOP) problem of sequential decision-making for resource allocations in multi-beam satellite systems,a deep reinforcement learning(DRL) based DRL-MOP algorithm was proposed to improve the system performance and user satisfaction degree.With considering the normalized weighted sum of spectrum efficiency,energy efficiency,and satisfaction index as the optimization goal,the dynamically changing system environments and user arrival model were built by the proposed algorithm,and the optimization of the accumulative performance in satellite systems based on DRL and MOP was realized.Simulation results show that the proposed algorithm can solve the MOP problem with rapid convergence ability and low complexity,and it is obviously superior to other algorithms in terms of system performance and user satisfaction optimization.

Key words: multi-beam satellite system, resource allocation, sequential decision-making, deep reinforcement learning, multi-objective optimization

中图分类号: 

No Suggested Reading articles found!