基于多智能体深度强化学习的多域协同抗干扰方法研究

doi:10.11959/j.issn.2096-3750.2022.00293

物联网学报 ›› 2022, Vol. 6 ›› Issue (4): 104-116.doi: 10.11959/j.issn.2096-3750.2022.00293

基于多智能体深度强化学习的多域协同抗干扰方法研究

张彪¹, 汪西明², 徐逸凡¹, 李文¹, 韩昊¹, 刘松仪¹, 陈学强¹

¹ 陆军工程大学通信工程学院，江苏南京 210007
² 国防科技大学信息通信学院，湖北武汉 430010

修回日期:2022-08-22 出版日期:2022-12-30 发布日期:2022-12-01
作者简介:张彪（1999- ），男，陆军工程大学通信工程学院硕士生，主要研究方向为智能通信抗干扰和强化学习
汪西明（1993- ），男，博士，国防科技大学信息通信学院讲师，主要研究方向为智能通信抗干扰、无线资源优化、多智能体决策理论等
徐逸凡（1995- ），男，博士，陆军工程大学通信工程学院讲师，主要研究方向为无线通信和智能通信抗干扰等
李文（1996- ），男，陆军工程大学通信工程学院博士生，主要研究方向为智能抗干扰通信、强化学习、博弈论和动态频谱接入等
韩昊（1996- ），男，陆军工程大学通信工程学院博士生，主要研究方向为智能频谱对抗、智能通信抗干扰、博弈论、机器学习等
刘松仪（1995- ），男，陆军工程大学通信工程学院博士生，主要研究方向为机器学习、智能抗干扰通信、无线通信资源优化等
陈学强（1985- ），男，博士，陆军工程大学通信工程学院副教授，主要研究方向为认知无线电、无线频谱资源优化等
基金资助:
国家自然科学基金资助项目(62071488);国家自然科学基金资助项目(61961010)

Multi-domain collaborative anti-jamming based on multi-agent deep reinforcement learning

Biao ZHANG¹, Ximing WANG², Yifan XU¹, Wen LI¹, Hao HAN¹, Songyi LIU¹, Xueqiang CHEN¹

¹ College of Communications Engineering, Army Engineering University of PLA, Nanjing 210007, China
² College of Information and Communication, National University of Defense Technology, Wuhan 430010, China

Revised:2022-08-22 Online:2022-12-30 Published:2022-12-01
Supported by:
The National Natural Science Foundation of China(62071488);The National Natural Science Foundation of China(61961010)

摘要/Abstract

摘要：

动态的传输需求和有限的缓存空间给恶意干扰环境下的无线数据传输带来巨大挑战。针对上述问题，从频域和时域的角度出发，研究了面向分布式物联网的协同抗干扰信道选择和数据调度联合决策方法，构建了基于多用户马尔可夫决策过程的数据传输模型，提出了基于多智能体深度强化学习的协同抗干扰信道和数据联合决策算法。仿真表明，所提算法可有效避开恶意干扰并避免同频互扰。相较于对比算法，网络吞吐量显著提高，丢包数量明显降低。

关键词: 协同抗干扰, 信道选择, 数据调度, 多智能体强化学习, 深度学习

Abstract:

Dynamic transmission requirements and the limited cache space bring great challenges to wireless data transmission in the malicious jamming environment.Aiming at the above problems, a collaborative anti-jamming channel selection and data scheduling joint decision method for distributed internet of things was studied from the perspective of frequency domain and time domain.A data transmission model based on multi-user Markov decision process was constructed and a collaborativeanti-jamming joint-channel-and-data decision algorithm based on multi-agent deep reinforcement learning was proposed.Simulation results show that the proposed algorithm can effectively avoid the malicious jamming and the co-channel interference.Compared with the comparison algorithm, the network throughput is significantly improved, and the number of packet dropout is significantly reduced.

Key words: collaborative anti-jamming, channel selection, data scheduling, multi-agent reinforcement learning, deep learning

中图分类号:

张彪, 汪西明, 徐逸凡, 李文, 韩昊, 刘松仪, 陈学强. 基于多智能体深度强化学习的多域协同抗干扰方法研究[J]. 物联网学报, 2022, 6(4): 104-116.

Biao ZHANG, Ximing WANG, Yifan XU, Wen LI, Hao HAN, Songyi LIU, Xueqiang CHEN. Multi-domain collaborative anti-jamming based on multi-agent deep reinforcement learning[J]. Chinese Journal on Internet of Things, 2022, 6(4): 104-116.

图/表 16

图1

图2

图3

图4

表1

表2

表3

图5

图6

图7

图8

图9

图10

图11

图12

图13

参考文献 30

[1]	CHOWDHURY M Z , SHAHJALAL M , AHMED S ,et al. 6G wireless communication systems:applications,requirements,technologies,challenges,and research directions[J]. IEEE Open Journal of the Communications Society, 2020(1): 957-975.
[2]	ZHANG L , LIANG Y C , NIYATO D . 6G Visions:mobile ultra-broadband,super internet of things,and artificial intelligence[J]. China Communications, 2019,16(8): 1-14.
[3]	AL-FUQAHA A , GUIZANI M , MOHAMMADI M ,et al. Internet of things:a survey on enabling technologies,protocols,and applications[J]. IEEE Communications Surveys ＆Tutorials, 2015,17(4): 2347-2376.
[4]	PIRAYESH H , ZENG H C . Jamming attacks and anti-jamming strategies in wireless networks:a comprehensive survey[J]. IEEE Communications Surveys ＆ Tutorials, 2022,24(2): 767-809.
[5]	KARAGIANNIS D , ARGYRIOU A . Jamming attack detection in a pair of RF communicating vehicles using unsupervised machine learning[J]. Vehicular Communications, 2018,13: 56-63.
[6]	王海超, 王金龙, 丁国如 ,等. 空天地一体化网络中智能协同抗干扰技术[J]. 指挥与控制学报, 2020,6(3): 185-191.
	WANG H C , WANG J L , DING G R ,et al. Intelligent cooperative anti-jamming technology in space-air-ground integrated networks[J]. Journal of Command and Control, 2020,6(3): 185-191.
[7]	冉雨, 程郁凡, 陈大勇 ,等. 采用BP神经网络的智能抗干扰决策引擎研究[J]. 信号处理, 2019,35(8): 1350-1357.
	RAN Y , CHENG Y F , CHEN D Y ,et al. Intelligent anti-jamming decision engine based on BP neural network[J]. Journal of Signal Processing, 2019,35(8): 1350-1357.
[8]	KONG L J , XU Y H , ZHANG Y L ,et al. A reinforcement learning approach for dynamic spectrum anti-jamming in fading environment[C]// Proceedings of 2018 IEEE 18th International Conference on Communication Technology. Piscataway:IEEE Press, 2018: 51-58.
[9]	PEI X F , WANG X M , YAO J N ,et al. Joint time-frequency anti-jamming communications:a reinforcement learning approach[C]// Proceedings of 2019 11th International Conference on Wireless Communications and Signal Processing (WCSP). Piscataway:IEEE Press, 2019: 1-6.
[10]	HAN H , WANG X M , GU F L ,et al. Better late than never:GAN-enhanced dynamic anti-jamming spectrum access with incomplete sensing information[J]. IEEE Wireless Communications Letters, 2021,10(8): 1800-1804.
[11]	XIAO L , WAN X Y , LU X Z ,et al. IoT security techniques based on machine learning:how do IoT devices use AI to enhance security?[J]. IEEE Signal Processing Magazine, 2018,35(5): 41-49.
[12]	LIU X , XU Y H , JIA L L ,et al. Anti-jamming communications using spectrum waterfall:a deep reinforcement learning approach[J]. IEEE Communications Letters, 2018,22(5): 998-1001.
[13]	XU Y F , XU Y H , REN G C ,et al. Play it by ear:context-aware distributed coordinated anti-jamming channel access[J]. IEEE Transactions on Information Forensics and Security, 2021,16: 5279-5293.
[14]	XU Y F , XU Y H , DONG X ,et al. Convert harm into benefit:a coordination-learning based dynamic spectrum anti-jamming approach[J]. IEEE Transactions on Vehicular Technology, 2020,69(11): 13018-13032.
[15]	XU Y F , REN G C , CHEN J ,et al. A one-leader multi-follower Bayesian-stackelberg game for anti-jamming transmission in UAV communication networks[J]. IEEE Access, 2018(6): 21697-21709.
[16]	YAO F Q , JIAL L . A collaborative multi-agent reinforcement learning anti-jamming algorithm in wireless networks[J]. IEEE Wireless Communications Letters, 2019,8(4): 1024-1027.
[17]	WANG X M , XU Y H , CHEN J ,et al. Mean field reinforcement learning based anti-jamming communications for ultra-dense Internet of Things in 6G[C]// Proceedings of 2020 International Conference on Wireless Communications and Signal Processing (WCSP). Piscataway:IEEE Press, 2020: 195-200.
[18]	ELLEUCH I , POURRANJBAR A , KADDOUM G . A novel distributed multi-agent reinforcement learning algorithm against jamming attacks[J]. IEEE Communications Letters, 2021,25(10): 3204-3208.
[19]	LI W , XU Y H , GUO Q J ,et al. A Q-learning-based channel selection and data scheduling approach for high-frequency communications in jamming environment[C]// Machine Learning and Intelligent Communications, 2019: 145-160.
[20]	WANG X M , CHEN X Q , WANG M ,et al. Decentralized reinforcement learning based anti-jamming communication for self-organizing networks[C]// Proceedings of 2021 IEEE Wireless Communications and Networking Conference. Piscataway:IEEE Press, 2021: 1-6.
[21]	PEI X F , WANG X M , RUAN L ,et al. Joint power and channel selection for anti-jamming communications:a reinforcement learning approach[C]// Machine Learning and Intelligent Communications, 2019: 551-562.
[22]	XUE C J , . Anti-interference performance of multi-path direct sequence spread spectrum wireless communication system[C]// Proceedings of 2010 International Conference on E-Health Networking Digital Ecosystems and Technologies (EDT). Piscataway:IEEE Press, 2010(1): 461-464.
[23]	ORORBIA M E , WARN G P . Design synthesis through a Markov decision process and reinforcement learning framework[J]. Journal of Computing and Information Science in Engineering, 2022,22(2): 021002.
[24]	FEINBERG V , WAN A , STOICA I ,et al. Model-based value estimation for efficient model-free reinforcement learning[EB]. 2018.
[25]	MNIH V , KAVUKCUOGLU K , SILVER D ,et al. Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540): 529-533.
[26]	HE K M , SUN J . Convolutional neural networks at constrained time cost[C]// Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2015: 5353-5360.
[27]	ZHANG X B , WANG H , RUAN L ,et al. Joint channel,power and bandwidth optimization for anti-jamming communications:amulti-agent Q-learning approach[C]// Proceedings of 2021 13th International Conference on Wireless Communications and Signal Processing (WCSP). Piscataway:IEEE Press, 2021: 1-6.
[28]	陈昕, 徐彤, 向旭东 ,等. 具有并行信道的认知无线网络性能评价研究[J]. 计算机研究与发展, 2013,50(10): 2126-2132.
	CHEN X , XU T , XIANG X D ,et al. Performance evaluation of cognitive radio networks with parallel channels[J]. Journal of Computer Research and Development, 2013,50(10): 2126-2132.
[29]	LI J , HAN Y . Optimal resource allocation for packet delay minimization in multi-layer UAV networks[J]. IEEE Communications Letters, 2017,21(3): 580-583.
[30]	KAWABATA A , CHATTERJEE B C , BA S ,et al. A real-time delay-sensitive communication approach based on distributed processing[J]. IEEE Access, 2017(5): 20235-20248.

名称	参数值
用户数量	N= 5
信道数量	M =10
缓冲区长度	L=10个
历史时长	Φ=200 ms
数据包到达速率	λ=[1,2,3,4,5,6,7]个/时隙
单时隙最大传输数据量	H_max=5个
发送功率	0 dBm
最低速率门限	V_th=7 Mbit/s
通信服务质量门限	β_th=10 dB
成功传输奖励因子	ε= 0.5
失败传输惩罚因子	?=- 0.1、η=-0.2

名称	参数值
干扰信道数量	4
干扰机功率	30 dBm
背景噪声功率	- 90 dBm/Hz
干扰切换速度	15 ms

名称	参数值
迭代次数	2 000
折扣因子	γ= 0.9
学习率	α=0.02
采样批次大小	B=64个

基于多智能体深度强化学习的多域协同抗干扰方法研究

Multi-domain collaborative anti-jamming based on multi-agent deep reinforcement learning

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 16

参考文献 30

相关文章 10

Metrics

推荐阅读 0

[1]	张志飞, 刘峰, 葛祎阳, 李烁, 张煜, 熊轲. 一种基于深度可分离卷积和注意力机制的入侵检测方法[J]. 物联网学报, 2023, 7(1): 49-59.
[2]	蒋锐, 孙刘婷, 王小明, 李大鹏, 徐友云. 基于AE和Transformer的运动想象脑电信号分类研究[J]. 物联网学报, 2023, 7(1): 118-128.
[3]	李贤, 毕宿志, 曾泓儒, 林彬, 林晓辉. 基于智能化用户协作的边缘计算任务卸载与资源分配优化[J]. 物联网学报, 2022, 6(4): 41-52.
[4]	罗丹, 徐茹枝, 关志涛. 物联网环境中基于深度学习的差分隐私预算优化方法[J]. 物联网学报, 2022, 6(2): 65-76.
[5]	徐宣哲, 宁珂, 郑学敏, 赵明心, 徐萌萌, 吴南健, 刘力源. 基于硬件仿真系统的边缘计算人工智能视觉芯片设计验证[J]. 物联网学报, 2022, 6(1): 20-28.
[6]	李国权, 徐永海, 林金朝, 黄正文. 基于Res-DNN的端到端MIMO系统信号检测算法[J]. 物联网学报, 2022, 6(1): 65-72.
[7]	谈玲, 荣杉山, 夏景明, SajibSarker, 马雯杰. 基于IR-VGG的多分类皮肤病实时诊断[J]. 物联网学报, 2021, 5(3): 115-125.
[8]	林椿珉, 曾烈康, 陈旭. 边缘智能驱动的高能效无人机自主导航算法研究[J]. 物联网学报, 2021, 5(2): 87-96.
[9]	陈慕涵,郭佳佳,李潇,金石. 基于深度学习的大规模MIMO信道状态信息反馈[J]. 物联网学报, 2020, 4(1): 33-44.
[10]	廖勇,姚海梅,花远肖. 一种基于深度学习的物联网信道状态信息获取算法[J]. 物联网学报, 2019, 3(1): 8-13.