电信科学 ›› 2022, Vol. 38 ›› Issue (11): 86-95.doi: 10.11959/j.issn.1000-0801.2022264

• 研究与开发 • 上一篇    下一篇

基于ET-PPO的双变跳频图案智能决策

陈一波, 赵知劲   

  1. 杭州电子科技大学通信工程学院,浙江 杭州 310018
  • 修回日期:2022-09-29 出版日期:2022-11-20 发布日期:2022-11-01
  • 作者简介:陈一波(1998- ),男,杭州电子科技大学通信工程学院硕士生,主要研究方向为认知无线电
    赵知劲(1959- ),女,博士,杭州电子科技大学通信工程学院教授、博士生导师,主要研究方向为信号处理、认知无线电技术
  • 基金资助:
    国家自然科学基金资助项目(U19B2016)

Intelligent anti-jamming decision algorithm of bivariate frequency hopping pattern based on ET-PPO

Yibo CHEN, Zhijin ZHAO   

  1. School of Communication Engineering, Hangzhou Dianzi University, Hangzhou 310018, China
  • Revised:2022-09-29 Online:2022-11-20 Published:2022-11-01
  • Supported by:
    The National Natural Science Foundation of China(U19B2016)

摘要:

为进一步提高双变跳频系统在复杂电磁环境中的抗干扰能力,提出了一种基于资格迹的近端策略优化(proximal policy optimization with eligibility traces,ET-PPO)算法。在传统跳频图案的基础上,引入时变参数,通过状态-动作-奖励三元组的构造将“双变”跳频图案决策问题建模为马尔可夫决策问题。针对 PPO算法“行动器”网络样本更新方式的高方差问题,引入加权重要性采样减小方差;采用Beta分布的动作选择策略,增强学习阶段的稳定性。针对“评判器”网络收敛速度慢的问题,引入资格迹方法,较好地平衡了收敛速度和全局最优解求解。在不同电磁干扰环境下的算法对比仿真结果表明,ET-PPO有更好的适应性和稳定性,对抗阻塞干扰和扫频干扰表现较好。

关键词: 复杂电磁环境, 双变跳频图案, 近端策略优化, 资格迹

Abstract:

In order to further improve its anti-interference ability in complex electromagnetic environment, a PPO algorithm based on weighted importance sampling and eligibility traces (ET-PPO) was proposed.On the basis of the traditional frequency hopping pattern, time-varying parameters were introduced, and the bivariate frequency hopping pattern decision problem was modeled as a Markov decision problem through the construction of the state-action-reward triple.Aiming at the high variance problem of the sample update method of an actor network of the PPO algorithm, weighted importance sampling was introduced to reduce the variance, and the action selection strategy of Beta distribution was used to enhance the stability of the learning stage.Aiming at the problem of slow convergence speed of the evaluator network, the eligibility trace method was introduced, which better balanced the convergence speed and the global optimal solution.The algorithm comparison simulation results in different electromagnetic interference environments show that ET-PPO has better adaptability and stability, and has better performance against obstruction interference and sweep frequency interference.

Key words: complex electromagnetic environment, bivariate frequency hopping pattern, proximal policy optimization, eligibility trace

中图分类号: 

No Suggested Reading articles found!