基于ET-PPO的双变跳频图案智能决策

doi:10.11959/j.issn.1000-0801.2022264

Abstract

Abstract:

In order to further improve its anti-interference ability in complex electromagnetic environment, a PPO algorithm based on weighted importance sampling and eligibility traces (ET-PPO) was proposed.On the basis of the traditional frequency hopping pattern, time-varying parameters were introduced, and the bivariate frequency hopping pattern decision problem was modeled as a Markov decision problem through the construction of the state-action-reward triple.Aiming at the high variance problem of the sample update method of an actor network of the PPO algorithm, weighted importance sampling was introduced to reduce the variance, and the action selection strategy of Beta distribution was used to enhance the stability of the learning stage.Aiming at the problem of slow convergence speed of the evaluator network, the eligibility trace method was introduced, which better balanced the convergence speed and the global optimal solution.The algorithm comparison simulation results in different electromagnetic interference environments show that ET-PPO has better adaptability and stability, and has better performance against obstruction interference and sweep frequency interference.

Key words: complex electromagnetic environment, bivariate frequency hopping pattern, proximal policy optimization, eligibility trace

CLC Number:

TN914
TP181

Yibo CHEN, Zhijin ZHAO. Intelligent anti-jamming decision algorithm of bivariate frequency hopping pattern based on ET-PPO[J]. Telecommunications Science, 2022, 38(11): 86-95.

Figures/Tables 7

References 18

[1]	任兴旌 . 跳频通信关键技术研究及系统设计[D]. 兰州:兰州交通大学, 2018.
	REN X J . Key technology research and system design of frequency hopping communication[D]. Lanzhou:Lanzhou Jiatong University, 2018.
[2]	柳永祥, 姚富强, 梁涛 . 变间隔、变跳速跳频通信技术[C]// 军事电子信息学术会议. 2006: 518-521.
	LIU Y X , YAO F Q , LIANG T . Bivariate frequency hopping communication technology[C]// Academic Conference on Military Electronic Information. 2006: 518-521.
[3]	严季, 梁涛, 祈竹 . 变跳速、变间隔跳频通信技术研究[J]. 无线通信技术, 2012,21(4): 25-29.
	YAN J , LIANG T , QI Z . Research on thefrequenct hopping communication technology of variable hopping rate and variable interval[J]. Wireless Communication Technology, 2012,21(4): 25-29.
[4]	汪小林, 黎亮, 张抒 . 基于均匀性补偿的跳频图案生成方法[J]. 兵工自动化, 2018,37(9): 12-14.
	WANG X L , LI L , ZHANG S . Frequency hopping based on uniformity compensation[J]. Ordnance Industry Automation, 2018,37(9): 12-14.
[5]	李金涛 . 宽间隔跳频序列设计与性能研究[D]. 成都:西南交通大学, 2007.
	LI J T . Study on frequency hopping sequences with givenminimumgap[D]. Chengdu:Southwest Jiaotong University, 2007.
[6]	陈刚, 黎福海 . 变速跳频通信抗跟踪干扰性能的研究[J]. 火力与指挥控制, 2016,41(7): 107-109.
	CHEN G , LI F H . Research on anti-follower jamming performance of variable rate frequency hopping communications[J]. Fire Control ＆ Command Control, 2016,41(7): 107-109.
[7]	王越超 . 自适应跳频通信系统关键技术研究[D]. 南京:东南大学, 2018.
	WANG Y C . Research on key technology of adaptive frequency hopping communication system[D]. Nanjing:Southeast University, 2018.
[8]	ZHU J S , ZHAO Z J , ZHENG S L . Intelligent anti-jamming decision algorithm of bivariate frequency hopping pattern based on DQN with PER and Pareto[J]. International Journal of Information Technology and Web Engineering, 2022,17(1): 1-23.
[9]	时圣苗, 刘全 . 采用分类经验回放的深度确定性策略梯度方法[J]. 自动化学报, 2022,48(7): 1816-1823.
	SHI S M , LIU Q . Deep deterministic policy gradient with classified experience replay[J]. Acta Automatica Sinica, 2022,48(7): 1816-1823.
[10]	CANO L G , FERREIRA M , DA S S A ,et al. Intelligent control of a quadrotor with proximal policy optimization reinforcement learning[C]// Proceedings of 2018 Latin American Robotic Symposium,2018 Brazilian Symposium on Robotics (SBR) and 2018 Workshop on Robotics in Education (WRE). Piscataway:IEEE Press, 2018: 503-508.
[11]	张浩昱, 熊凯 . 基于近端策略优化算法的四足机器人步态控制研究[J]. 空间控制技术与应用, 2019,45(3): 53-58.
	ZHANG H Y , XIONG K . On gait control of quadruped robot based on proximal policy optimization algorithm[J]. Aerospace Control and Application, 2019,45(3): 53-58.
[12]	MAYER S , CLASSEN T , ENDISCH C . Modular production control using deep reinforcement learning:proximal policy optimization[J]. Journal of Intelligent Manufacturing, 2021,32(8): 2335-2351.
[13]	舒凌洲 . 基于深度强化学习的城市道路交通控制算法研究[D]. 成都:电子科技大学, 2020.
	SHU L Z . Research on urban traffic control algorithm based on deep reinforcement learning[D]. Chengdu:University of Electronic Science and Technology of China, 2020.
[14]	GUAN Y , REN Y G , LI S E ,et al. Centralized cooperation for connected and automated vehicles at intersections by proximal policy optimization[J]. IEEE Transactions on Vehicular Technology, 2020,69(11): 12597-12608.
[15]	GU Y , CHENG Y H , CHEN C L P ,et al. Proximal policy optimization with policy feedback[J]. IEEE Transactions on Systems,Man,and Cybernetics:Systems, 2022,52(7): 4600-4610.
[16]	王鸿涛 . 基于强化学习的机械臂自学习控制[D]. 哈尔滨:哈尔滨工业大学, 2019.
	WANG H T . Self learning control of mechanical arm based on reinforcement learning[D]. Harbin:Harbin Institute of Technology, 2019.
[17]	ZHANG L , ZHANG Y S , ZHAO X ,et al. Image captioning via proximal policy optimization[J]. Image and Vision Computing, 2021,108:104126.
[18]	LIN S Y , BELING P A . An end-to-end optimal trade execution framework based on proximal policy optimization[C]// Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. California:International Joint Conferences on Artificial Intelligence Organization, 2020: 4548-4554.

Metrics

Recommended 0

No Suggested Reading articles found!

Intelligent anti-jamming decision algorithm of bivariate frequency hopping pattern based on ET-PPO

RichHTML

PDF下载

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 7

References 18

Related Articles 0

Metrics

Recommended 0