基于多智能体深度强化学习的分布式干扰协调

doi:10.11959/j.issn.1000-436x.2020149

通信学报 ›› 2020, Vol. 41 ›› Issue (7): 38-48.doi: 10.11959/j.issn.1000-436x.2020149

基于多智能体深度强化学习的分布式干扰协调

刘婷婷,罗义南,杨晨阳

北京航空航天大学电子信息工程学院，北京 100191

修回日期:2020-03-07 出版日期:2020-07-25 发布日期:2020-08-01
作者简介:刘婷婷（1982- ），女，陕西西安人，博士，北京航空航天大学副教授，主要研究方向为基于机器学习和无线大数据的干扰管理、资源规划和信息预测|罗义南（1995- ），男，辽宁丹东人，北京航空航天大学硕士生，主要研究方向为超密集网络中的分布式干扰协调|杨晨阳（1965- ），女，浙江杭州人，博士，北京航空航天大学教授、博士生导师，主要研究方向为基于机器学习、无线大数据的缓存、传输资源管理、超可靠低延时通信等
基金资助:
国家自然科学基金资助项目(61731002);国家自然科学基金资助项目(61671036)

Distributed interference coordination based on multi-agent deep reinforcement learning

Tingting LIU,Yi’nan LUO,Chenyang YANG

School of Electronic and Information Engineering,Beihang University,Beijing 100191,China

Revised:2020-03-07 Online:2020-07-25 Published:2020-08-01
Supported by:
The National Natural Science Foundation of China(61731002);The National Natural Science Foundation of China(61671036)

摘要/Abstract

摘要：

针对干扰网络中的文件下载业务，提出了一种基于多智能体深度强化学习的分布式干扰协调策略。所提策略能够在节点之间只需交互少量信息的条件下，根据干扰环境和业务需求的特点自适应调整传输策略。仿真结果表明，对于任意的用户数和业务需求，所提策略相对于未来信息预测理想时最优策略的用户满意度损失不超过11%。

关键词: 多智能体深度强化学习, 非实时业务, 分布式干扰协调, 超密集网络

Abstract:

A distributed interference coordination strategy based on multi-agent deep reinforcement learning was investigated to meet the requirements of file downloading traffic in interference networks.By the proposed strategy transmission scheme could be adjusted adaptively based on the interference environment and traffic requirements with limited amount of information exchanged among nodes.Simulation results show that the user satisfaction loss of the proposed strategy from the optimal strategy with perfect future information does not exceed 11% for arbitrary number of users and traffic requirements.

Key words: multi-agent deep reinforcement learning, non-realtime traffic, distributed interference coordination, ultra-dense network

中图分类号:

TN929.53

刘婷婷,罗义南,杨晨阳. 基于多智能体深度强化学习的分布式干扰协调[J]. 通信学报, 2020, 41(7): 38-48.

Tingting LIU,Yi’nan LUO,Chenyang YANG. Distributed interference coordination based on multi-agent deep reinforcement learning[J]. Journal on Communications, 2020, 41(7): 38-48.

图/表 14

图1

图2

图3

图4

图5

表1

无线网络仿真参数"

参数	数值
网络区域	20 m × 500 m矩形区域
系统带宽/MHz	20
基站数	40
基站发射功率/dBm	30
基站发射天线数	2
用户个数	5～30
噪声功率/dBm	-95
路径损耗	38.4+36.8 log₁₀d
规划窗长度/s	300
帧长/s	1
时隙长/ms	100
用户速度/(m·s^-1)	1.5～6.0
用户请求的文件大小B_k/MB	100～500
用户期望的传输时间 $(T_{k}^{exp} - T_{k}^{start}) /s$	30～60

表1

表2

图6

表3

图7

图8

表4

表5

图9

参考文献 21

[1]	TENG Y , LIU M , YU F R ,et al. Resource allocation for ultra-dense networks:a survey,some research issues and challenges[J]. IEEE Communications Surveys ＆ Tutorials, 2019,21(3): 2134-2168.
[2]	YAO C , YANG C , XIONG Z . Energy-saving predictive resource planning and allocation[J]. IEEE Transactions on Communications, 2016,64(12): 5078-5095.
[3]	GUO K , LIU T , YANG C ,et al. Interference coordination and resource allocation planning with predicted average channel gains for HetNets[J]. IEEE Access, 2018,6(1): 60137-60151.
[4]	GOMADAM K , CADAMBE V R , JAFAR S A . Approaching the capacity of wireless networks through distributed interference alignment[J]. IEEE Transactions on Information Theory, 2011,57(6): 3309-3322.
[5]	XU C , SHENG M , WANG X ,et al. Distributed subchannel allocation for interference mitigation in OFDMA femtocells:a utility-based learning approach[J]. IEEE Transactions on Vehicular Technology, 2014,64(6): 2463-2475.
[6]	WANG X , ZHANG H , TIAN Y ,et al. Optimal distributed interference mitigation for small cell networks with non-orthogonal multiple access:a locally cooperative game[J]. IEEE Access, 2018,6(1): 63107-63119.
[7]	GALINDO-SERRANO A , GIUPPONI L . Distributed Q-learning for aggregated interference control in cognitive radio networks[J]. IEEE Transactions on Vehicular Technology, 2010,59(4): 1823-1834.
[8]	AMIRI R , MEHRPOUYAN H , FRIDMAN L ,et al. A machine learning approach for power allocation in HetNets considering QoS[C]// IEEE International Conference on Communications. Piscataway:IEEE Press, 2018: 1-7.
[9]	ZHANG Y , KANG C , MA T ,et al. Power allocation in multi-cell networks using deep reinforcement learning[C]// IEEE Vehicular Technology Conference. Piscataway:IEEE Press, 2018: 1-6.
[10]	ZHAO N , LIANG Y C , NIYATO D ,et al. Deep reinforcement learning for user association and resource allocation in heterogeneous cellular networks[J]. IEEE Transactions on Wireless Communications, 2019,18(11): 5141-5152.
[11]	XU Y , YU J , HEADLEY W C ,et al. Deep reinforcement learning for dynamic spectrum access in wireless networks[C]// IEEE Military Communications Conference. Piscataway:IEEE Press, 2018: 1-6.
[12]	NASIR Y S , GUO D . Multi-agent deep reinforcement learning for dynamic power allocation in wireless networks[J]. IEEE Journal on Selected Areas in Communications, 2019,37(10): 2239-2250.
[13]	YE H , LI G Y . Deep reinforcement learning for resource allocation in V2V communications[J]. IEEE Transactions on Vehicular Technology, 2019,68(4): 3163-3173.
[14]	XIONG Z , ZHANG Y , NIYATO D ,et al. Deep reinforcement learning for mobile 5G and beyond:Fundamentals,applications,and challenges[J]. IEEE Vehicular Technology Magazine, 2019,14(2): 44-52.
[15]	LI H , GAO H , LYU T ,et al. Deep Q-learning based dynamic resource allocation for self-powered ultra-dense networks[C]// IEEE International Conference on Communications Workshops. Piscataway:IEEE Press, 2018: 1-6.
[16]	NGUYEN T T , NGUYEN N D , NAHAVANDI S . Deep reinforcement learning for multi-agent systems:a review of challenges,solutions and applications[J]. arXiv preprint arXiv:1812.11794, 2018
[17]	PROEBSTER M , KASCHUB M , WERTHMANN T ,et al. Context-aware resource allocation for cellular wireless networks[J]. EURASIP Journal on Wireless Communications and Networking, 2012,2012(1): 216-235.
[18]	SI J , BARTO A G , POWELL W B ,et al. Reinforcement learning in large,high-dimensional state spaces[M]. Piscataway: IEEE PressPress, 2004.
[19]	MNIH V , KAVUKCUOGLU K , SILVER D ,et al. Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540): 529-533.
[20]	VAN H H , GUEZ A , SILVER D . Deep reinforcement learning with double Q-learning[C]// Proceedings of AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2015: 2094-2100.
[21]	3GPP TSG RAN.Further advancements for E-UTRA physical layer aspects[S].(2010-03)[2020-01-10].

参数	数值
输入层神经元个数	4
隐藏层神经元个数	8层，每层256个
输出层神经元个数	2
激活函数	ReLU
学习率	0.001
折扣因子	0.1
探索率ε	0.01
经验回放池容量	20 000
批规模	256

传输策略	获取信息方式	信息量
无协调（无需求）	—	0
集中式（无需求）	交互	N_fK²C
集中式（有需求）	预测	(3K+N_fK ²)C
分布式DQN	交互	N _f(3K+1)C

B/MB	无协调（无需求）	集中式（无需求）	集中式（有需求）
200	2.28%	-0.28%	-0.67%
300	11.08%	1.42%	-1.93%
400	26.27%	2.29%	-6.81%
500	42.39%	3.91%	-9.94%

K/个	无协调（无需求）	集中式（无需求）	集中式（有需求）
15	13.13%	-0.74%	-3.88%
20	26.27%	2.29%	-6.81%
25	45.85%	2.89%	-10.39%
30	66.89%	15.64%	-8.54%

基于多智能体深度强化学习的分布式干扰协调

Distributed interference coordination based on multi-agent deep reinforcement learning

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献 21

相关文章 4

Metrics

推荐阅读 0

[1]	黄杰,杨凡,高乙文,张博为. 超密集网络导频复用干扰避免策略[J]. 通信学报, 2020, 41(7): 165-171.
[2]	李晓娜,王中方,程谦,付婧雯,张顺亮. 基于迭代联合传输的超密集网络动态干扰协调方案[J]. 通信学报, 2020, 41(2): 176-186.
[3]	吴宣利,谢子怡,吴玮. 室内超密集网络中基于干扰图的自适应干扰协调方法[J]. 通信学报, 2019, 40(9): 15-23.
[4]	吴宣利,陈旭. 干扰门限与回程容量限制下UDN的能效与谱效联合优化算法[J]. 通信学报, 2019, 40(12): 86-97.