基于强化学习的多层卫星网络边缘安全决策方法

doi:10.11959/j.issn.1000-436x.2022111

通信学报 ›› 2022, Vol. 43 ›› Issue (6): 189-199.doi: 10.11959/j.issn.1000-436x.2022111

基于强化学习的多层卫星网络边缘安全决策方法

左珮良¹, 侯少龙¹^,², 郭超¹, 蒋华¹^,², 王文博³

¹ 北京电子科技学院电子与通信工程系，北京 100070
² 西安电子科技大学通信工程学院，陕西西安 710068
³ 北京邮电大学信息与通信工程学院，北京 100876

修回日期:2022-04-25 出版日期:2022-06-01 发布日期:2022-06-01
作者简介:左珮良（1991- ），男，山东烟台人，博士，北京电子科技学院讲师，主要研究方向为卫星通信、认知无线电、物联网、信息安全、软件定义网络
侯少龙（1999- ），男，山西原平人，西安电子科技大学硕士生，主要研究方向为卫星通信、人工智能
郭超（1987- ），女，江西九江人，北京电子科技学院讲师，主要研究方向为卫星通信、应急通信、传输控制、网络负载均衡、信息安全、物联网
蒋华（1962- ），男，山西大同人，北京电子科技学院教授，主要研究方向为通信安全、应急通信、物联网、下一代网络
王文博（1965- ），男，河北安国人，博士，北京邮电大学教授，主要研究方向为无线通信、3G/4G/5G/6G通信、卫星通信、认知无线电、物联网、信息安全、软件定义网络
基金资助:
国家自然科学基金资助项目(62001251);国家自然科学基金资助项目(62001252);北京高校“高精尖”学科建设基金资助项目(202100130401);西安电子科技大学综合业务网理论及关键技术国家重点实验室基金资助项目(ISN22-13)

Security decision method for the edge of multi-layer satellite network based on reinforcement learning

Peiliang ZUO¹, Shaolong HOU¹^,², Chao GUO¹, Hua JIANG¹^,², Wenbo WANG³

¹ Department of Electronics and Communication Engineering, Beijing Electronic Science and Technology Institute, Beijing 100070, China
² School of Communication Engineering, Xidian University, Xi’an 710068, China
³ School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China

Revised:2022-04-25 Online:2022-06-01 Published:2022-06-01
Supported by:
The National Natural Science Foundation of China(62001251);The National Natural Science Foundation of China(62001252);“High-precision” Discipline Construction Project in Beijing Universities(202100130401);Xidian University Integrated Business Network Theory and Key Technology State Key Laboratory Project(ISN22-13)

摘要/Abstract

摘要：

目的：多层卫星网络是空天地一体化技术的重要组成，本文旨在依靠卫星节点的自主判决能力，发挥网络边缘场景中针对感知数据包含加解密和压缩在内的处理以及回传方面的任务协作能力。以确保数据安全为前提，以低传输时延为目标，实现任务卫星在多层卫星网络架构中的边缘决策。

方法：本文考虑了由低轨卫星、中轨卫星以及高轨地球同步轨道卫星组成的多层卫星网络。其中，低轨卫星节点负责观测侦察业务（如气象观测、地理侦测、情报侦察等），中轨卫星视为边缘场景中的雾节点，并由其中一颗担任雾运算处理中心，负责规划观测数据的压缩处理和安全加密所在卫星节点以及数据回传的网络选择，地球同步轨道卫星则具备最大的覆盖范围和最强的运算处理能力。本文使用深度强化学习算法实现卫星网络的边缘安全决策。具体来说，边缘中心节点通过感知系统获得卫星网络的环境状态，在此基础上利用深度强化学习算法自主学习的能力，拟合得到场景下最优的数据卸载策略，获得最优的链路规划，使得星上资源得到充分利用，从而达到众多观测任务的平均回传时延达最小的目的。首先，边缘中心节点对环境进行观察，获取环境中观测卫星任务数据量大小、信道条件、边缘节点处理能力等状态要素，在此基础上通过深度Q网络完成状态到动作的映射，实现初步的策略选择；策略作用于卫星网络，会改变环境的状态，同时环境对策略作出评价，以奖励的形式反馈给边缘中心节点；边缘中心节点基于新的环境状态和收益，进行误差计算和Q值的更新，以此来优化动作选择策略，从而获得更高的奖励收益以及新的环境状态；上述过程不断迭代最终获得最优策略。

结果：采用Keras作为仿真平台，并在仿真实验中，假定低轨卫星的星座为常见的Walker星座。以多层卫星网络中的某一区域作为仿真对象，设定该区域低轨观测卫星数量为8颗，中轨卫星数量为3颗，高轨卫星数量为一颗。本文的仿真结果包含三个方面：1）对不同卫星数量情况下各方法针对随机快照的收敛性能进行仿真。仿真结果表明，所提方法针对不同卫星数量的情况均表现出了收敛趋势，随着卫星数量的增加，所提方法达到收敛所需要的训练次数明显增加，这是由于卫星数量的增加大幅提升了方法动作空间的大小；2）对所提方法在不同网络构型条件下的性能进行了对比。仿真结果表明，所提方法在所有4种不同构型条件下均具有最好的收敛性能，然而在部分快照下，低-高网络构型的起始性能非常优异，但随着训练的进行，其收敛性能变得较差，这是由于该网络构型的链路选择较少，这限制了其性能；3）采用测试集对所提方法与对比方法的性能进行仿真验证。仿真结果表明，相较于随机边缘安全决策和由信噪比参数为导向的边缘安全决策，本文所提方法在时延性能上具有较大的优势，且与遍历得到的最优边缘安全决策性能相差较小。

结论：本文针对场景中为低轨观测卫星进行多层卫星节点的链路选择问题，提出一种基于深度强化学习的数据压缩与加密回传决策方法。通过结合场景合理地设计方法的状态、动作、奖励以及训练网络等相关参数，所提方法能够以低传输时延为目标进行智能高效的边缘决策。

关键词: 多层卫星网络, 低轨卫星, 边缘决策, 强化学习, 数据加密

Abstract:

Objectives: Multi-layer satellite network is an important component of space-ground integration technology.The purpose of this paper is to rely on the autonomous decision ability of satellite nodes to give full play to the processing and backhaul tasks of sensing data including encryption, decryption and compression in network edge scenarios. Collaboration. With the premise of ensuring data security and the goal of low transmission delay,the edge decision-making of mission satellites in the multi-layer satellite network architecture is realized.

Methods:This paper considers a multi-layer satellite network consisting of low-orbit satellites, medium-orbit satellites, and high-orbit geosynchronous satellites.Among them,the low-orbit satellite nodes are responsible for observation and reconnaissance services (such as meteorological observation, geographic detection, intelligence reconnaissance,etc.),and the medium-orbit satellites are regarded as fog nodes in edge scenarios, and one of them serves as the fog computing processing center, responsible for planning and observing The data compression processing and security encryption are located in the satellite node and the network selection of the data backhaul. The geosynchronous orbit satellite has the largest coverage and the strongest computing processing capability. This paper uses deep reinforcement learning algorithms to implement edge security decisions for satellite networks. Specifically, the edge center node obtains the environmental state of the satellite network through the perception system, and on this basis, uses the ability of deep reinforcement learning algorithm to learn independently, and obtains the optimal data offloading strategy in the scene by fitting, and obtains the optimal link planning., so that the onboard resources can be fully utilized, so as to achieve the goal of minimizing the average return delay of many observation tasks.First,the edge center node observes the environment and obtains state elements such as the data volume, channel conditions, and edge node processing capability of the observation satellite mission in the environment. Selection;the strategy acts on the satellite network,which will change the state of the environment,and the environment will evaluate the strategy and feed it back to the edge center node in the form of reward;the edge center node will perform error calculation and update the Q value based on the new environment state and income,in order to optimize the action selection strategy,so as to obtain higher rewards and new environmental states; the above process is continuously iterated to finally obtain the optimal strategy.

Results:Keras is used as the simulation platform,and in the simulation experiment,the constellation of low-orbit satellites is assumed to be the common Walker constellation. Taking a certain area in the multi-layer satellite network as the simulation object, the number of low-orbit observation satellites in this area is set to 8,the number of medium-orbit satellites is 3, and the number of high-orbit satellites is one. The simulation results include three aspects:1)Simulation of the convergence performance of each method for random snapshots with different numbers of satellites. The simulation results show that the proposed method shows a convergence trend for different numbers of satellites. With the increase of the number of satellites,the number of training times required for the proposed method to achieve convergence increases significantly. This is because the increase in the number of satellites increases significantly.The size of the action space of the method;2)The performance of the proposed method under different network configuration conditions is compared. Simulation results show that the proposed method has the best convergence performance under all 4 different configuration conditions,however,the initial performance of the low-high network configuration is excellent under partial snapshots,but as the training progresses, Its convergence performance becomes poor, because the network configuration has fewer link choices,which limits its performance; 3) The performance of the proposed method and the comparison method is simulated and verified by using the test set. The simulation results show that compared with the random edge security decision and the edge security decision oriented by the signal-to-noise ratio parameter,the method proposed has a greater advantage in the delay performance, and is comparable to the optimal edge security decision performance obtained by traversal.The difference is small.

Conclusions:Aiming at the link selection problem of multi-layer satellite nodes for low-orbit observation satellites in the scene,this paper proposes a data compression and encryption backhaul decision method based on deep reinforcement learning. By rationally designing the state, action, reward, and training network related parameters of the method in combination with the scene, the proposed method can make intelligent and efficient edge decision-making with the goal of low transmission delay.

Key words: multi-layer satellite network, LEO satellite, edge decision, reinforcement learning, data encryption

中图分类号:

TN92

左珮良, 侯少龙, 郭超, 蒋华, 王文博. 基于强化学习的多层卫星网络边缘安全决策方法[J]. 通信学报, 2022, 43(6): 189-199.

Peiliang ZUO, Shaolong HOU, Chao GUO, Hua JIANG, Wenbo WANG. Security decision method for the edge of multi-layer satellite network based on reinforcement learning[J]. Journal on Communications, 2022, 43(6): 189-199.

图/表 8

图1

图2

表1

仿真参数设置"

超参数	设置值
低轨卫星、中轨卫星、高轨卫星数量/颗	8、3、1
折扣因子γ、探索因子ε	0.9、0.005～0.900
ε衰减速率、学习率ξ	0.995、0.01
经验回放库Γ、小批次容量	500、32
观测数据量 $α_{n}, n = 1, 2, \dots, N$ /MB	3～20
$γ_{n}^{L}, n = 1, 2, \dots, N / (KB \cdot s^{- 1})$	400～1 200
$γ^{G}, γ_{z}^{M}, z = 1, 2, \dots, Z$	4 MB/s，1 500～3 600 KB/s
$φ^{G}, φ_{z}^{M}, z = 1, 2, \dots, Z$	3 MB/s，800～2 400 KB/s
$λ^{G}, λ_{z}^{M}, z = 1, 2, \dots, Z$	6 MB/s，3 000～5 500 KB/s
压缩比κ、带宽B/MHz	0.4、2
信噪比 $η_{n}^{LG}, n = 1, 2, \dots, N$ /dB	5～20
$η_{n, z}^{LM}, n = 1, 2, \dots, N, z = 1, 2, \dots, Z$ /dB	25～40
$β^{L}, β^{M}, β^{G}$ /ms	10～400，1 000，1 500
$ψ_{n, z}^{LM}, ψ_{n}^{LG}, n = 1, 2, \dots, N, z = 1, 2, \dots, Z$ /ms	10～15，100～110
$ϖ^{L}, ϖ^{M}, ϖ^{G} / (MB \cdot s^{- 1})$	2，4，8

表1

图3

图4

图5

表2

图6

参考文献 17

[1]	王丽娜, 王兵, 周贤伟 ,等. 卫星通信系统[M]. 北京: 国防工业出版社, 2006.
	WANG L N , WANG B , ZHOU X W ,et al. Satellite communication system[M]. Beijing: National Defense Industry Press, 2006.
[2]	YOU X H , WANG C X , HUANG J ,et al. Towards 6G wireless communication networks:vision,enabling technologies,and new paradigm shifts[J]. Science China Information Sciences, 2020,64(1): 1-74.
[3]	TATARIA H , SHAFI M , MOLISCH A F ,et al. 6G wireless systems:vision,requirements,challenges,insights,and opportunities[J]. Proceedings of the IEEE, 2021,109(7): 1166-1199.
[4]	ZUO P L , WANG C , YAO Z ,et al. An intelligent routing algorithm for LEO satellites based on deep reinforcement learning[C]// Proceedings of 2021 IEEE 94th Vehicular Technology Conference. Piscataway:IEEE Press, 2021: 1-5.
[5]	DI B Y , ZHANG H L , SONG L Y ,et al. Ultra-dense LEO:integrating terrestrial-satellite networks into 5G and beyond for data offloading[J]. IEEE Transactions on Wireless Communications, 2019,18(1): 47-62.
[6]	夏士超, 姚枝秀, 鲜永菊 ,等. 移动边缘计算中分布式异构任务卸载算法[J]. 电子与信息学报, 2020,42(12): 2891-2898.
	XIA S C , YAO Z X , XIAN Y J ,et al. A distributed heterogeneous task offloading methodology for mobile edge computing[J]. Journal of Electronics ＆ Information Technology, 2020,42(12): 2891-2898.
[7]	钟磊 . 低轨星座通信网络边缘计算架构研究[D]. 成都:电子科技大学, 2020.
	ZHONG L . Research on edge computing architecture of LEO constellation communication network[D]. Chengdu:University of Electronic Science and Technology of China, 2020.
[8]	王元君 . 星地混合网络中的计算资源分配和负载均衡[D]. 北京:北京邮电大学, 2020.
	WANG Y J . Computing resource allocation and load balancing in hybrid satellite-terrestrial network[D]. Beijing:Beijing University of Posts and Telecommunications, 2020.
[9]	DING C F , WANG J B , ZHANG H ,et al. Joint optimization of transmission and computation resources for satellite and high altitude platform assisted edge computing[J]. IEEE Transactions on Wireless Communications, 2022,21(2): 1362-1377.
[10]	ZHOU D , SHENG M , WANG Y X ,et al. Machine learning-based resource allocation in satellite networks supporting Internet of remote things[J]. IEEE Transactions on Wireless Communications, 2021,20(10): 6606-6621.
[11]	JIANG C X , ZHU X M . Reinforcement learning based capacity management in multi-layer satellite networks[J]. IEEE Transactions on Wireless Communications, 2020,19(7): 4685-4699.
[12]	闵士权, 刘光明, 陈兵 ,等. 天地一体化信息网络[M]. 北京: 电子工业出版社, 2020.
	MIN S Q , LIU G M , CHEN B ,et al. Space-ground integrated information network[M]. Beijing: Electronic Industry Press, 2020.
[13]	黄娟 . 基于MATLAB/STK的卫星通信场景仿真设计与实现[D]. 合肥:安徽大学, 2016.
	HUANG J . The design and implementation of simulation for satellite communication scene based on MATLAB/STK[D]. Hefei:Anhui University, 2016.
[14]	GU B , ZHANG X , LIN Z Q ,et al. Deep multiagent reinforcement-learning-based resource allocation for Internet of controllable things[J]. IEEE Internet of Things Journal, 2021,8(5): 3066-3074.
[15]	MNIH V , KAVUKCUOGLU K , SILVER D ,et al. Playing atari with deep reinforcement learning[J]. arXiv Preprint,arXiv:1312.5602, 2013.
[16]	HASSELT H V , GUEZ A , SILVER D . Deep reinforcement learning with double Q-learning[C]// Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2016: 2094-2100.
[17]	YU Y D , WANG T T , LIEW S C . Deep-reinforcement learning multiple access for heterogeneous wireless networks[C]// Proceedings of IEEE Journal on Selected Areas in Communications. Piscataway:IEEE Press, 2019: 1277-1290.

方法	10 ms	20 ms	30 ms	40 ms	50 ms	60 ms
DQN-ESD	0.793	0.834	0.861	0.888	0.901	0.901
DQN-ESD ^MG	0.901	0.901	0.901	0.901	0.901	0.901
DQN-ESD ^LM	0.811	0.855	0.895	0.924	0.951	0.958
DQN-ESD ^LG	0.858	0.926	0.994	1.000	1.000	1.000

基于强化学习的多层卫星网络边缘安全决策方法

Security decision method for the edge of multi-layer satellite network based on reinforcement learning

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 17

相关文章 15

Metrics

推荐阅读 0

[1]	马玲, 樊漆亮, 许婷, 郭冠琛, 张圣林, 孙永谦, 张玉志. 基于强化学习的在线离线混部云环境下的调度框架[J]. 通信学报, 2023, 44(6): 90-102.
[2]	金彪, 李逸康, 姚志强, 陈瑜霖, 熊金波. GenFedRL：面向深度强化学习智能体的通用联邦强化学习框架[J]. 通信学报, 2023, 44(6): 183-197.
[3]	李元诚, 秦永泰. 基于深度强化学习的软件定义安全中台QoS实时优化算法[J]. 通信学报, 2023, 44(5): 181-192.
[4]	周大成, 陈鸿昶, 何威振, 程国振, 扈红超. 基于深度强化学习的微服务多维动态防御策略研究[J]. 通信学报, 2023, 44(4): 50-63.
[5]	许国良, 谭峰, 冉泳屹, 陈丰. 面向多波束卫星系统的波束跳变与覆盖控制联合优化算法[J]. 通信学报, 2023, 44(4): 78-86.
[6]	许文俊, 吴思雷, 王凤玉, 林兰, 李国军, 张治. 基于多智能体强化学习的大规模灾后用户分布式覆盖优化[J]. 通信学报, 2022, 43(8): 1-16.
[7]	沙宗轩, 霍如, 孙闯, 汪硕, 黄韬. 基于深度强化学习的转发效能感知流量调度算法[J]. 通信学报, 2022, 43(8): 30-40.
[8]	马帅, 李兵, 盛海鸿, 谷荣妍, 周辉, 王洪梅, 王悦, 李世银. 基于深度强化学习的可见光定位通信一体化功率分配研究[J]. 通信学报, 2022, 43(8): 121-130.
[9]	张宇, 程旻. NDN中边缘计算与缓存的联合优化[J]. 通信学报, 2022, 43(8): 164-175.
[10]	张先超, 赵耀, 叶海军, 樊锐. 无线网络多用户干扰下智能发射功率控制算法[J]. 通信学报, 2022, 43(2): 15-21.
[11]	李传煌, 陈泱婷, 唐晶晶, 楼佳丽, 谢仁华, 方春涛, 王伟明, 陈超. QL-STCT：一种SDN链路故障智能路由收敛方法[J]. 通信学报, 2022, 43(2): 131-142.
[12]	陈晋音, 胡书隆, 邢长友, 张国敏. 面向智能渗透攻击的欺骗防御方法[J]. 通信学报, 2022, 43(10): 106-120.
[13]	苏新, 孟蕾蕾, 周一青, CELIMUGE Wu. 基于深度强化学习的海洋移动边缘计算卸载方法[J]. 通信学报, 2022, 43(10): 133-145.
[14]	杜丽娜, 卓力, 杨硕, 李嘉锋, 张菁. 基于强化学习的移动视频流业务码率自适应算法研究进展[J]. 通信学报, 2021, 42(9): 205-217.
[15]	李赞, 胡俊凡, 李兵, 石嘉, 司江勃. 基于正交时频空技术的低轨卫星通信的安全分析[J]. 通信学报, 2021, 42(8): 25-32.