基于多智能体强化学习的异构网络CRE偏置动态优化算法

doi:10.11959/j.issn.1000-436x.2023235

通信学报 ›› 2023, Vol. 44 ›› Issue (12): 86-98.doi: 10.11959/j.issn.1000-436x.2023235

• 学术论文 • 上一篇

基于多智能体强化学习的异构网络CRE偏置动态优化算法

张铖¹^,², 朱家烨¹, 刘泽宁², 黄永明¹^,²

¹ 东南大学移动通信全国重点实验室，江苏南京 211111
² 网络通信与安全紫金山实验室，江苏南京 211111

修回日期:2023-11-13 出版日期:2023-12-01 发布日期:2023-12-01
作者简介:张铖（1988- ），男，安徽望江人，博士，东南大学副教授、博士生导师，主要研究方向为无线通信系统的空时信号处理、机器学习辅助的无线通信智能优化技术等
朱家烨（1998- ），女，江苏无锡人，东南大学硕士生，主要研究方向为无线通信网络中的多小区干扰协调
刘泽宁（1993- ），男，江苏淮安人，博士，网络通信与安全紫金山实验室研究员，主要研究方向为边缘计算、智能干扰优化和资源分配技术
黄永明（1977- ），男，江苏吴江人，博士，网络通信与安全紫金山实验室研究员，东南大学博士生导师，主要研究方向为智能5G/6G移动通信、毫米波无线通信等
基金资助:
国家自然科学基金资助项目(62225107);国家自然科学基金资助项目(62271140);江苏省前沿引领技术基础研究重大基金资助项目(BK20222001);江苏省创新创业人才计划基金资助项目(JSSCBS20211332)

Multi-agent reinforcement learning based dynamic optimization algorithm of CRE offset for heterogeneous networks

Cheng ZHANG¹^,², Jiaye ZHU¹, Zening LIU², Yongming HUANG¹^,²

¹ National Mobile Communication Research Laboratory, Southeast University, Nanjing 211111, China
² Purple Mountain Laboratories: Networking, Communications and Security, Nanjing 211111, China

Revised:2023-11-13 Online:2023-12-01 Published:2023-12-01
Supported by:
The National Natural Science Foundation of China(62225107);The National Natural Science Foundation of China(62271140);The Natural Science Foundation on Frontier Leading Technology Basic Research Project of Jiangsu(BK20222001);The Jiangsu Innovative and Entrepreneurial Talent Program(JSSCBS20211332)

摘要/Abstract

摘要：

为应对无线网络用户激增导致的高吞吐量需求，针对宏微异构网络干扰场景，提出一种基于多智能体强化学习的小区范围扩展（CRE）偏置动态优化算法。基于协作多智能体强化学习的值分解网络框架，通过合理利用并在微微基站间交互系统内用户分布及其所受干扰水平，实现所有微微基站的个性化 CRE 偏置值在线本地化决策。仿真结果表明，与CRE=5 dB、分布式Q-Learning算法相比，所提算法在提高系统吞吐量、均衡各基站吞吐量及改善边缘用户吞吐量方面具有明显优势。

关键词: 异构网络, 小区范围扩展, 多智能体强化学习, 值分解网络算法

Abstract:

To cope with the high throughput demand caused by the proliferation of wireless network users, a multi-agent reinforcement learning based dynamic optimization algorithm of cell range expansion (CRE) offset was proposed for interference scenarios in macro-pico heterogeneous networks.Based on the value decomposition network framework of collaborative multi-agent reinforcement learning, a personalized online local decision of CRE offset for all pico-base stations was achieved by reasonably utilizing and interacting the intra-system user distribution and their interference levels among pico-base stations.Simulation results show that the proposed algorithm has significant advantages in increasing system throughput, balancing the throughput of each base station and improving edge-user throughput, compared to CRE=5 dB and distributed Q-learning algorithms.

Key words: heterogeneous network, cell range expansion, multi-agent reinforcement learning, value decomposition network algorithm

中图分类号:

TN92

张铖, 朱家烨, 刘泽宁, 黄永明. 基于多智能体强化学习的异构网络CRE偏置动态优化算法[J]. 通信学报, 2023, 44(12): 86-98.

Cheng ZHANG, Jiaye ZHU, Zening LIU, Yongming HUANG. Multi-agent reinforcement learning based dynamic optimization algorithm of CRE offset for heterogeneous networks[J]. Journal on Communications, 2023, 44(12): 86-98.

图/表 12

图1

图2

图3

图4

表1

表2

图5

图6

图7

图8

图9

图10

参考文献 28

[1]	CHUANG K , YEKTAII H , OUTALEB N ,et al. Towards sustainable networks:attacking energy consumption in wireless infrastructure with novel technologies[J]. IEEE Microwave Magazine, 2023,24(12): 44-59.
[2]	ELHOUSHY S , IBRAHIM M , HAMOUDA W . Cell-free massive MIMO:a survey[J]. IEEE Communications Surveys ＆ Tutorials, 2022,24(1): 492-523.
[3]	XU Y J , GUI G , GACANIN H ,et al. A survey on resource allocation for 5G heterogeneous networks:current research,future trends,and challenges[J]. IEEE Communications Surveys ＆ Tutorials, 2021,23(2): 668-695.
[4]	3GPP. Requirements for further advancements for evolved universal terrestrial radio access (E-UTRA) (LTE-advanced):TR 36.913[S]. 2011.
[5]	BIANZINO A P , CHAUDET C , ROSSI D ,et al. A survey of green networking research[J]. IEEE Communications Surveys ＆ Tutorials, 2012,14(1): 3-20.
[6]	JAMIL S , ABBAS M S , UMAIR M ,et al. A review of techniques and challenges in green communication[C]// Proceedings of International Conference on Information Science and Communication Technology (ICISCT). Piscataway:IEEE Press, 2020: 1-6.
[7]	DAMNJANOVIC A , MONTOJO J , WEI Y B ,et al. A survey on 3GPP heterogeneous networks[J]. IEEE Wireless Communications, 2011,18(3): 10-21.
[8]	ABBAS Z H , HAROON M S , MUHAMMAD F ,et al. Enabling soft frequency reuse and stienen’s cell partition in two-tier heterogeneous networks:cell deployment and coverage analysis[J]. IEEE Transactions on Vehicular Technology, 2021,70(1): 613-626.
[9]	LI J , WANG X M , LI Z Q ,et al. Energy efficiency optimization based on eICIC for wireless heterogeneous networks[J]. IEEE Internet of Things Journal, 2019,6(6): 10166-10176.
[10]	MICHEL D D E , ROGER F B A , GUTENBERT K W J . Performance evaluation of the eICIC technique applied to a heterogeneous 4G mobile network[J]. European Journal of Applied Sciences, 2022,10(2): 540-560.
[11]	TORRES-CRUZ N , VILLORDO-JIMENEZ I , MONTIEL-SAAVEDRA A . Analysis of the geographical-information impact on the performance of ABS-CRE HetNets[J]. IEEE Latin America Transactions, 2020,18(3): 613-622.
[12]	JUNG T , SONG I , LEE S ,et al. Cell range expansion with geometric information of pico-cell in heterogeneous networks[C]// Proceedings of IEEE 87th Vehicular Technology Conference (VTC Spring). Piscataway:IEEE Press, 2018: 1-5.
[13]	LEE C N , LIN J H , WU C F ,et al. A dynamic CRE and ABS scheme for enhancing network capacity in LTE-advanced heterogeneous networks[J]. Wireless Networks, 2019,25(6): 3307-3322.
[14]	成思玥, 李浩然, 白卫岗 ,等. 基于多智能体深度强化学习的测运控一体化资源调度方法[J]. 天地一体化信息网络, 2023,4(1): 12-22.
	CHENG S Y , LI H R , BAI W G ,et al. Resource scheduling method for integration of TT＆C and observation based on multi-agent deep reinforcement learning[J]. Space-Integrated-Ground Information Networks, 2023,4(1): 12-22.
[15]	张彪, 汪西明, 徐逸凡 ,等. 基于多智能体深度强化学习的多域协同抗干扰方法研究[J]. 物联网学报, 2022,6(4): 104-116.
	ZHANG B , WANG X M , XU Y F ,et al. Multi-domain collaborative anti-jamming based on multi-agent deep reinforcement learning[J]. Chinese Journal on Internet of Things, 2022,6(4): 104-116.
[16]	丁雨, 李晨凯, 韩会梅 ,等. 基于5G无人机通信的多智能体异构网络选择方法[J]. 电信科学, 2022,38(8): 28-36.
	DING Y , LI C K , HAN H M ,et al. Multi-agent heterogeneous network selection method based on 5G UAV communication[J]. Telecommunications Science, 2022,38(8): 28-36.
[17]	CHOI H , KIM T , PARK H S ,et al. A cooperative online learning-based load balancing scheme for maximizing QoS satisfaction in dense HetNets[J]. IEEE Access, 2021,9: 92345-92357.
[18]	ALSUHLI G , BANAWAN K , ATTIAH K ,et al. Mobility load management in cellular networks:a deep reinforcement learning approach[J]. IEEE Transactions on Mobile Computing, 2023,22(3): 1581-1598.
[19]	KUDO T , OHTSUKI T . Cell range expansion using distributed Q-learning in heterogeneous networks[J]. EURASIP Journal on Wireless Communications and Networking, 2013(1): 1-10.
[20]	ASGHARI M Z , OZTURK M , HAMALAINEN J . Reinforcement learning based mobility load balancing with the cell individual offset[C]// Proceedings of IEEE 93rd Vehicular Technology Conference (VTC2021-Spring). Piscataway:IEEE Press, 2021: 1-5.
[21]	TABUCHI S , MAKINO I , MIKI N . Combined usage of convex optimization and neural network for resource allocation[C]// Proceedings of 14th International Conference on Signal Processing and Communication Systems (ICSPCS). Piscataway:IEEE Press, 2020: 1-6.
[22]	MATIGNON L , LAURENT G J , LE FORT-PIAT N . Independent reinforcement learners in cooperative Markov games:a survey regarding coordination problems[J]. The Knowledge Engineering Review, 2012,27(1): 1-31.
[23]	SUNEHAG P , LEVER G , GRUSLYS A ,et al. Value-decomposition networks for cooperative multi-agent learning based on team reward[C]// Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. New York:ACM Press, 2018: 2085-2087.
[24]	DAI Y Y , ZHANG K , MAHARJAN S ,et al. Deep reinforcement learning for stochastic computation offloading in digital twin networks[J]. IEEE Transactions on Industrial Informatics, 2021,17(7): 4968-4977.
[25]	FERIANI A , HOSSAIN E . Single and multi-agent deep reinforcement learning for AI-enabled wireless networks:a tutorial[J]. IEEE Communications Surveys ＆ Tutorials, 2021,23(2): 1226-1252.
[26]	WANG H N , LIU N , ZHANG Y Y ,et al. Deep reinforcement learning:a survey[J]. Frontiers of Information Technology ＆ Electronic Engineering, 2020,21(12): 1726-1744.
[27]	TABISH R , MIKAYEL S , SCHROEDER D W C ,et al. Monotonic value function factorisation for deep multi-agent reinforcement learning[J]. Journal of Machine Learning Research, 2020,21(1): 7234-7284.
[28]	CASTELLINI J , OLIEHOEK F A , SAVANI R ,et al. The representational capacity of action-value networks for multi-agent reinforcement learning[C]// Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. New York:ACM Press, 2019: 1862-1864.

参数	取值
参数	MBS	PBS
小区半径/m	289	40
发射功率/dBm	46	30
距离相关路径损耗/dB	140.7+36.7lgR	128.1+36.7lgR
载波频率/GHz	2
带宽/MHz	20
热噪声密度/(dBm·Hz^-1)	-174
信道衰落	瑞利衰落
偏置值范围/dB	{1,2,3,4,5,6,7,8,9}
SINR阈值/dB	15
MBS与PBS之间最小距离/m	75
PBS与PBS之间最小距离/m	40
MBS与用户之间最小距离/m	35
PBS与用户之间最小距离/m	10

参数	取值
学习率r_a	0.000 5
折扣因子r_d	0.88
奖励调节因子μ	65
贪婪策略初始值ε₀	0.5
每轮最大相同步骤数	30
最大训练轮数	500
最大训练步骤数	100
经验池容量	10 000
小批量数据	32
隐藏层1神经元数量	32
隐藏层2神经元数量	32

基于多智能体强化学习的异构网络CRE偏置动态优化算法

Multi-agent reinforcement learning based dynamic optimization algorithm of CRE offset for heterogeneous networks

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 28

相关文章 15

Metrics

推荐阅读 0

[1]	燕锋, 林晓薇, 李正浩, 徐霞, 夏玮玮, 沈连丰. 智能电网中基于多智能体强化学习的频谱分配算法[J]. 通信学报, 2023, 44(9): 12-24.
[2]	张雨童, 彭煜明, 邸博雅, 宋令阳. 天地算力网络中的异构资源协同博弈[J]. 通信学报, 2023, 44(12): 15-27.
[3]	刘冰艺, 刘煜昊, 韩玮祯, 夏振厂, 吴黎兵, 熊盛武. 边缘智能下基于强化学习的车联网路由协议[J]. 通信学报, 2023, 44(11): 110-119.
[4]	许文俊, 吴思雷, 王凤玉, 林兰, 李国军, 张治. 基于多智能体强化学习的大规模灾后用户分布式覆盖优化[J]. 通信学报, 2022, 43(8): 1-16.
[5]	罗洪斌, 张珊, 王志远. 共生网络——异构网络安全高效互联的体系结构与机理[J]. 通信学报, 2022, 43(4): 36-49.
[6]	杨力, 潘成胜, 孔相广, 黄琦龙, 戚耀文. 5G融合卫星网络研究综述[J]. 通信学报, 2022, 43(4): 202-215.
[7]	曹阳, 钟烨, 彭醇陵, 彭小峰. 基于混合供能和能量协作的异构网络能量效率优化算法[J]. 通信学报, 2022, 43(3): 135-147.
[8]	神显豪, 曾紫玲, 牛少华. 面向异构网络的可重构智能表面辅助资源优化方法[J]. 通信学报, 2022, 43(11): 171-182.
[9]	王雪, 刘京, 孙佳妮, 张继真, 钱志鸿. 基于谱聚类的异构蜂窝超密集网络高能效资源分配算法[J]. 通信学报, 2021, 42(7): 162-175.
[10]	徐勇军,谢豪,陈前斌,林金朝,刘期烈. 基于不完美CSI的异构NOMA网络能效优化算法[J]. 通信学报, 2020, 41(7): 131-140.
[11]	张海君,张资政,隆克平. 基于移动边缘计算的NOMA异构网络资源分配[J]. 通信学报, 2020, 41(4): 27-33.
[12]	赵海涛,陈跃,张唐伟,朱奇星,朱洪波. 车载异构网络中基于前向时延的多径传输路径调度优化[J]. 通信学报, 2020, 41(11): 99-107.
[13]	张波,黄开枝,钟州,陈亚军. 异构携能通信网络中人工噪声辅助的顽健能量与信息安全传输方案[J]. 通信学报, 2019, 40(3): 60-72.
[14]	夏玮玮,丁兆明,沈连丰. 异构网络中基于负载传递的联合接纳控制算法[J]. 通信学报, 2018, 39(5): 34-47.
[15]	刘诚毅,陈赓,邢松,沈连丰. 基于中继节点辅助的Femtocell混合接入控制算法[J]. 通信学报, 2017, 38(1): 54-65.