基于多智能体强化学习的大规模灾后用户分布式覆盖优化

doi:10.11959/j.issn.1000-436x.2022131

通信学报 ›› 2022, Vol. 43 ›› Issue (8): 1-16.doi: 10.11959/j.issn.1000-436x.2022131

• 学术论文 • 下一篇

基于多智能体强化学习的大规模灾后用户分布式覆盖优化

许文俊¹, 吴思雷¹, 王凤玉¹, 林兰¹, 李国军², 张治³

¹ 北京邮电大学人工智能学院，北京 100876
² 重庆邮电大学超视距可信信息传输研究所，重庆 400065
³ 北京邮电大学信息与通信工程学院，北京 100876

修回日期:2022-05-23 出版日期:2022-08-25 发布日期:2022-08-01
作者简介:许文俊（1982- ），男，安徽安庆人，博士，北京邮电大学教授、博士生导师，主要研究方向为 B5G/6G 智能无线网络、语义智能通信网络、无人机通信及组网、认知无线网络等
吴思雷（1997- ），男，北京人，北京邮电大学硕士生，主要研究方向为智能无线通信、分布式系统设计、机器学习、强化学习等
王凤玉（1992- ），女，山西朔州人，博士，北京邮电大学讲师，主要研究方向为无线人工智能、通信感知一体化、智简通信等
林兰（1996- ），女，河北衡水人，北京邮电大学博士生，主要研究方向为应急通信、NOMA、非凸优化方法、深度强化学习等
李国军（1978- ），男，四川资阳人，博士，重庆邮电大学教授、博士生导师，主要研究方向为复杂恶劣环境超视距无线通信与网络
张治（1977- ），男，河北安平人，博士，北京邮电大学副教授、硕士生导师，主要研究方向为移动通信、电子信号处理、通信系统设计等
基金资助:
国家重点研发计划基金资助项目(2019YFC1511302);国家自然科学基金资助项目(61871057);国家自然科学基金资助项目(61790553);中央高校基本科研业务费专项资金资助项目(2019XD-A13)

Large-scale post-disaster user distributed coverage optimization based on multi-agent reinforcement learning

Wenjun XU¹, Silei WU¹, Fengyu WANG¹, Lan LIN¹, Guojun LI², Zhi ZHANG³

¹ School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China
² Lab of BLOS Trusted Information Transmission, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
³ School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China

Revised:2022-05-23 Online:2022-08-25 Published:2022-08-01
Supported by:
The National Key Research and Development Program of China(2019YFC1511302);The National Natural Science Foundation of China(61871057);The National Natural Science Foundation of China(61790553);The Fundamental Research Funds for the Central Universities(2019XD-A13)

摘要/Abstract

摘要：

为了快速恢复大规模受灾用户的应急通信服务，针对接入用户数量众多导致的业务差异性和动态性显著、集中式算法难以扩展等问题，提出了一种基于多智能体强化学习的分布式智简覆盖优化架构。在网络特征层中，设计了考虑用户业务差异性的分布式 k-sums 分簇算法，每个无人机基站从用户需求出发，原生简约地调整局部网络结构，并筛选簇中心用户特征作为多智能体强化学习神经网络的输入状态。在轨迹调控层中，设计了多智能体最大熵强化学习（MASAC）算法，无人机基站作为智能节点以“分布式训练-分布式执行”的框架调控自身飞行轨迹，并融合集成学习和课程学习技术提升了训练稳定性和收敛速度。仿真结果表明，所提分布式 k-sums 分簇算法在平均负载效率和分簇均衡性方面优于k-means算法，基于MASAC的无人机基站轨迹调控算法能够有效减小通信中断的发生频率、提升网络的频谱效率，效果优于现有的强化学习方法。

关键词: 应急通信, 覆盖优化, 多智能体强化学习, 分布式训练

Abstract:

In order to quickly restore emergency communication services for large-scale post-disaster users, a distributed intellicise coverage optimization architecture based on multi-agent reinforcement learning (RL) was proposed, which could address the significant differences and dynamics of communication services caused by a large number of access users, and the difficulty of expansion caused by centralized algorithms.Specifically, a distributed k-sums clustering algorithm considering service differences of users was designed in the network characterization layer, which could make each unmanned aerial vehicle base station (UAV-BS) adjust the local networking natively and simply, and obtain states of cluster center for multi-agent RL.In the trajectory control layer, multi-agent soft actor critic (MASAC) with distributed-training-distributed-execution structure was designed for UAV-BS to control trajectory as intelligent nodes.Furthermore, ensemble learning and curriculum learning were integrated to improve the stability and convergence speed of training process.The simulation results show that the proposed distributed k-sums algorithm is superior to the k-means in terms of average load efficiency and clustering balance, and MASAC based trajectory control algorithm can effectively reduce communication interruptions and improve the spectrum efficiency, which outperforms the existing RL algorithms.

Key words: emergency communication, coverage optimization, multi-agent reinforcement learning, distributed training

中图分类号:

TN929.5

许文俊, 吴思雷, 王凤玉, 林兰, 李国军, 张治. 基于多智能体强化学习的大规模灾后用户分布式覆盖优化[J]. 通信学报, 2022, 43(8): 1-16.

Wenjun XU, Silei WU, Fengyu WANG, Lan LIN, Guojun LI, Zhi ZHANG. Large-scale post-disaster user distributed coverage optimization based on multi-agent reinforcement learning[J]. Journal on Communications, 2022, 43(8): 1-16.

图/表 17

图1

表1

系统和算法参数"

参数	含义
N ,M	受灾用户、无人机基站的数量
$N_{j}, M_{j}$	无人机基站 j可观测的用户、无人机基站数量
$κ_{1}, κ_{2}$	用户激活状态Beta分布的参数
$μ_{i}, σ_{i}$	用户i的传输业务类型分布的均值与标准差
$u_{i}, u_{j}$	用户i、无人机基站 j对应的簇中心用户
$f_{c}$	地面通信或空地通信的中心频率
$p_{i}, p_{u}, p_{j}$	用户i、簇中心用户u、无人机基站 j的位置
$P_{1}, P_{2}$	簇内用户、簇中心用户的传输功率
L ,G	路径损耗、信道增益
$N_{0}$	接收端的噪声功率
$d_{i}, D_{i}$	用户i的新传输任务、总传输任务的大小
$R_{i}, R_{j}$	用户i、无人机基站j的频谱效率
B	地面通信资源块的带宽
$n_{i, u}$	用户i占用的资源块数量
$N_{c}$	簇内用户负载阈值
η	平均负载效率
t₀	历史观测数据集存储的帧数
λ	用户优先参数
$\hat{μ}, \hat{σ}$	用户优先参数分布的均值、标准差
$y_{i, j}$	用户i是否处于簇 j的分簇标识
$g_{i_{1}, i_{2}}$	用户i₁对于用户i₂的不相似性度量
$s_{t}, a_{t}, r_{t}$	强化学习状态、动作、奖励函数
γ	折扣因子
θ	深度神经网络参数
α	最大熵强化学习温度因子
η	学习步长
W	集成学习神经网络的组数

表1

图2

图3

图4

图5

图6

图7

表2

应急通信网络系统和算法参数设置"

参数	取值	参数	取值
用户数目N	500	观测用户数 $N_{j}$	$\frac{2 N}{M}$
邻近基站数 $M_{j}$	3	带宽B/MHz	1
空地频率 $f_{c}^{air}$ /GHz	4	地面频率 $f_{c}^{ground}$ /GHz	2
噪声 $N_{0}$ /(dBm·Hz^-1)	-174	无人机最大速度/(m·s^-1)	10
簇中心功率P₁/dBm	100	用户功率P₂/dBm	20
负载阈值N_c	5	总任务时长T₀/s	500
激活参数 $κ_{1}$	2	激活参数 $κ_{2}$	5
惩罚值 $ξ_{collision}$	500	惩罚值 $ξ_{outage}$	10
步长 $η_{1}$	0.001	步长 $η_{2, 3}$	0.000 1
步长 $η_{4, 5, 6}$	0.001	集成学习维度W	10
折扣因子γ	0.99	经验回放样本数	128

表2

图8

图9

图10

图11

图12

图13

图14

图15

参考文献 25

[1]	DEEPAK G C , LADAS A , SAMBO Y A ,et al. An overview of post-disaster emergency communication systems in the future networks[J]. IEEE Wireless Communications, 2019,26(6): 132-139.
[2]	GUO H Z , LI J Y , LIU J J ,et al. A survey on space-air-ground-sea integrated network security in 6G[J]. IEEE Communications Surveys＆ Tutorials, 2022,24(1): 53-87.
[3]	ZHOU Y Q , LIU L , WANG L ,et al. Service-aware 6G:an intelligent and open network based on the convergence of communication,computing and caching[J]. Digital Communications and Networks, 2020,6(3): 253-260.
[4]	ZHANG P , XU W J , GAO H ,et al. Toward wisdom-evolutionary and primitive-concise 6G:a new paradigm of semantic communication networks[J]. Engineering, 2022,8: 60-73.
[5]	张平, 许晓东, 韩书君 ,等. 智简无线网络赋能行业应用[J]. 北京邮电大学学报, 2020,43(6): 1-9.
	ZHANG P , XU X D , HAN S J ,et al. Entropy reduced mobile networks empowering industrial applications[J]. Journal of Beijing University of Posts and Telecommunications, 2020,43(6): 1-9.
[6]	ZHOU Y Q , TIAN L , LIU L ,et al. Fog computing enabled future mobile communication networks:a convergence of communication and computing[J]. IEEE Communications Magazine, 2019,57(5): 20-27.
[7]	KANG Z Y , YOU C S , ZHANG R . 3D placement for multi-UAV relaying:an iterative Gibbs-sampling and block coordinate descent optimization approach[J]. IEEE Transactions on Communications, 2021,69(3): 2047-2062.
[8]	YIN S X , LI L H , YU F R . Resource allocation and basestation placement in downlink cellular networks assisted by multiple wireless powered UAVs[J]. IEEE Transactions on Vehicular Technology, 2020,69(2): 2171-2184.
[9]	ZHANG Y X , CHENG W C . Trajectory and power optimization for multi-UAV enabled emergency wireless communications networks[C]// Proceedings of International Conference on Communications Workshops. Piscataway:IEEE Press, 2019: 1-6.
[10]	LI X , WANG Q , LIU J ,et al. Trajectory design and generalization for UAV enabled networks:a deep reinforcement learning approach[C]// Proceedings of Wireless Communications and Networking Conference. Piscataway:IEEE Press, 2020: 1-6.
[11]	LIU X , LIU Y W , CHEN Y . Reinforcement learning in multiple-UAV networks:deployment and movement design[J]. IEEE Transactions on Vehicular Technology, 2019,68(8): 8036-8049.
[12]	CHALLITA U , SAAD W , BETTSTETTER C . Interference management for cellular-connected UAVs:a deep reinforcement learning approach[J]. IEEE Transactions on Wireless Communications, 2019,18(4): 2125-2140.
[13]	ZHAO N , LIU Z H , CHENG Y Q . Multi-agent deep reinforcement learning for trajectory design and power allocation in multi-UAV networks[J]. IEEE Access, 8: 139670-139679.
[14]	QIN Z Q , LIU Z H , HAN G J ,et al. Distributed UAV-BSs trajectory optimization for user-level fair communication service with multi-agent deep reinforcement learning[J]. IEEE Transactions on Vehicular Technology, 2021,70(12): 12290-12301.
[15]	LOWE R , WU Y , TAMAR A ,et al. Multi-agent actor-critic for mixed cooperative-competitive environments[J]. arXiv Preprint,arXiv:1706.02275, 2017.
[16]	HAARNOJA T , TANG H R , ABBEEL P ,et al. Reinforcement learning with deep energy-based policies[J]. arXiv Preprint,arXiv:1702.08165, 2017.
[17]	NAVARRO-ORTIZ J , ROMERO-DIAZ P , SENDRA S ,et al. A survey on 5G usage scenarios and traffic models[J]. IEEE Communications Surveys ＆ Tutorials, 2020,22(2): 905-929.
[18]	3GPP. Technical specification group (TSG) RAN WG4; RF system scenarios:TR 25.942 v2.1.3[S]. 2000.
[19]	WANG L T , SUN L T , TOMIZUKA M ,et al. Socially-compatible behavior design of autonomous vehicles with verification on real human data[J]. IEEE Robotics and Automation Letters, 2021,6(2): 3421-3428.
[20]	PEI S , NIE F , WANG R ,et al. Efficient clustering based on a unified view of k-means and ratio-cut[J]. Advances in Neural Information Processing Systems, 2020,33: 14855-14866.
[21]	HAARNOJA T , ZHOU A , ABBEEL P ,et al. Soft actor-critic:off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]// International Conference on Machine Learning. New York:PMLR, 2018: 1861-1870.
[22]	HASSELT H V , GUEZ A , SILVER D . Deep reinforcement learning with double Q-learning[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2016: 2094-2100.
[23]	DONG X B , YU Z W , CAO W M ,et al. A survey on ensemble learning[J]. Frontiers of Computer Science, 2020,14(2): 241-258.
[24]	NARVEKAR S , PENG B , LEONETTI M ,et al. Curriculum learning for reinforcement learning domains:a framework and survey[J]. arXiv Preprint,arXiv:2003.04960, 2020.
[25]	SUTTON R S , BARTO A G . Reinforcement learning:an introduction[M]. Massachusetts: MIT Press, 1998.

基于多智能体强化学习的大规模灾后用户分布式覆盖优化

Large-scale post-disaster user distributed coverage optimization based on multi-agent reinforcement learning

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 17

参考文献 25

相关文章 6

Metrics

推荐阅读 0

[1]	王莉, 魏青, 徐连明, 沈渊, 张平, 费爱国. 面向通信-导航-感知一体化的应急无人机网络低能耗部署研究[J]. 通信学报, 2022, 43(7): 1-20.
[2]	胡青松,杨维,丁恩杰,李世银,李冰皓. 煤矿应急救援通信技术的现状与趋势[J]. 通信学报, 2019, 40(5): 163-179.
[3]	李方敏,曾乐,沈逸,张韬. 应急通信系统中快速二层切换协议的设计与实现[J]. 通信学报, 2017, 38(Z2): 8-16.
[4]	沈连丰,朱亚萍,丁兆明,燕锋,邓曙光. 软件定义传感器网络重配置算法研究[J]. 通信学报, 2016, 37(7): 38-49.
[5]	官铮,钱文华,虞美乐. 支持紧急通信的无线认知网络频谱接入方法[J]. 通信学报, 2012, 33(Z2): 182-188.
[6]	王巍,赵文红. 应急通信网拓扑模型及其分析[J]. 通信学报, 2012, 33(Z1): 201-209.