天地一体化信息网络 ›› 2023, Vol. 4 ›› Issue (1): 12-22.doi: 10.11959/j.issn.2096-8930.2023002

所属专题: 大规模星座组网及测控关键技术

• 专题:大规模星座组网及测控关键技术 • 上一篇    下一篇

基于多智能体深度强化学习的测运控一体化资源调度方法

成思玥, 李浩然, 白卫岗, 周笛, 朱彦   

  1. 西安电子科技大学通信工程学院,陕西 西安 710071
  • 修回日期:2023-03-01 出版日期:2023-03-20 发布日期:2023-03-01
  • 作者简介:成思玥(1998-),女,西安电子科技大学综合业务网理论与关键技术国家重点实验室硕士生,主要研究方向为巨型星座系统宽带民航业务资源切片
    李浩然(1991-),男,西安电子科技大学综合业务网理论与关键技术国家重点实验室讲师,主要研究方向为大规模卫星星座资源分配与任务调度、网络的群智涌现与群体行为
    白卫岗(1987-),男,西安电子科技大学综合业务网理论与关键技术国家重点实验室副教授,主要研究方向为水声通信网络、卫星通信网络、空天地海一体化网络架构、组网协议及仿真系统
    周笛(1991-),女,西安电子科技大学综合业务网理论与关键技术国家重点实验室副教授,主要研究方向为空间信息网络任务规划及资源管理、卫星互联网资源管控技术等
    朱彦(1993-),男,西安电子科技大学综合业务网理论与关键技术国家重点实验室讲师,主要研究方向为端到端可靠传输、服务质量保障等
  • 基金资助:
    国家重点研发计划资助项目(2020YFB1806100);国家自然科学基金青年项目(62101410);秦创原引用高层次创新创业人才项目(QCYRCXM-2022-228)

Resource Scheduling Method for Integration of TT&C and Observation Based on Multi-Agent Deep Reinforcement Learning

Siyue CHENG, Haoran LI, Weigang BAI, Di ZHOU, Yan ZHU   

  1. School of Telecommunications Engineering, Xidian University, Xi’an 710071,China
  • Revised:2023-03-01 Online:2023-03-20 Published:2023-03-01
  • Supported by:
    National Key Research&Development Program of China(2020YFB1806100);National Natural Science Foundation of China(62101410);The Foundation of Shaanxi Province(QCYRCXM-2022-228)

摘要:

随着卫星通信技术的发展,星座规模的不断扩大,测运控一体化成为主流趋势。星座规模大、调度对象多、复杂操作联合控制给卫星网络测运控一体化资源调度带来巨大的挑战。受制于调度算法求解效率低、约束复杂等问题,传统的测运控资源调度技术采用提前上注测控指令,按照固定部署执行任务,难以满足突发事件与紧急任务的调度需求。因此,提出一种基于多智能体演员-评判家确定性策略梯度算法的测运控一体化资源调度方法,采用集中式训练和分布式执行的方法,建立测运控一体化任务的多智能体模型,通过分析邻居智能体局部信息计算调度策略,提高任务的响应速度。依据测运控一体化资源调度问题中的模型和约束,选择影响意义大、可解释的约束,建立多智能体资源调度强化学习模型,并进行仿真测试。测试结果显示,该方法的任务收益较传统方法提高22%。

关键词: 测运控一体化, 大规模星座系统, 资源调度, 多智能体深度强化学习, 任务收益

Abstract:

With the development of satellite communication technology and the continuous expansion of the constellation scale, the integration of TT&C and observation technology has become the mainstream trend.The large constellation scale, many scheduling objects and complex operation joint control bring great challenges to the integrated resource scheduling of satellite network TT&C and observation.Subject to the low solution effi ciency and complex constraints of scheduling algorithms, the traditional TT&C resource scheduling technology adopts the advance injection TT&C instructions to perform tasks according to the fi xed deployment, which is diffi cult to meet the scheduling needs of emergencies and emergency tasks.Therefore, a kind of resource scheduling method based on multi-agent actor-Agent Actor-Critic Deterministic Policy Gradient Algorithms (MADDPG) was presented.With centralized training and distributed execution, the multi-agent model of integrated task of TT&C and observation was established.By analyzed the scheduling strategy of neighbor agent, the response speed of local information was improved.According to the model and constraints in the integrated resource scheduling problem of TT&C and observation, selected signifi cant and interpretable constraints, then established the multi-agent resource scheduling reinforcement learning model, and carried on the simulation test.The simulation results showed that the task benefi t of this method was 22% higher than the traditional method.

Key words: integration of TT&C and observation, large-scale constellation system, resources scheduling, multi-agent deep reinforcement learning, tasks reward

中图分类号: 

No Suggested Reading articles found!