通信学报 ›› 2022, Vol. 43 ›› Issue (6): 189-199.doi: 10.11959/j.issn.1000-436x.2022111

• 学术论文 • 上一篇    下一篇

基于强化学习的多层卫星网络边缘安全决策方法

左珮良1, 侯少龙1,2, 郭超1, 蒋华1,2, 王文博3   

  1. 1 北京电子科技学院电子与通信工程系,北京 100070
    2 西安电子科技大学通信工程学院,陕西 西安 710068
    3 北京邮电大学信息与通信工程学院,北京 100876
  • 修回日期:2022-04-25 出版日期:2022-06-01 发布日期:2022-06-01
  • 作者简介:左珮良(1991- ),男,山东烟台人,博士,北京电子科技学院讲师,主要研究方向为卫星通信、认知无线电、物联网、信息安全、软件定义网络
    侯少龙(1999- ),男,山西原平人,西安电子科技大学硕士生,主要研究方向为卫星通信、人工智能
    郭超(1987- ),女,江西九江人,北京电子科技学院讲师,主要研究方向为卫星通信、应急通信、传输控制、网络负载均衡、信息安全、物联网
    蒋华(1962- ),男,山西大同人,北京电子科技学院教授,主要研究方向为通信安全、应急通信、物联网、下一代网络
    王文博(1965- ),男,河北安国人,博士,北京邮电大学教授,主要研究方向为无线通信、3G/4G/5G/6G通信、卫星通信、认知无线电、物联网、信息安全、软件定义网络
  • 基金资助:
    国家自然科学基金资助项目(62001251);国家自然科学基金资助项目(62001252);北京高校“高精尖”学科建设基金资助项目(202100130401);西安电子科技大学综合业务网理论及关键技术国家重点实验室基金资助项目(ISN22-13)

Security decision method for the edge of multi-layer satellite network based on reinforcement learning

Peiliang ZUO1, Shaolong HOU1,2, Chao GUO1, Hua JIANG1,2, Wenbo WANG3   

  1. 1 Department of Electronics and Communication Engineering, Beijing Electronic Science and Technology Institute, Beijing 100070, China
    2 School of Communication Engineering, Xidian University, Xi’an 710068, China
    3 School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Revised:2022-04-25 Online:2022-06-01 Published:2022-06-01
  • Supported by:
    The National Natural Science Foundation of China(62001251);The National Natural Science Foundation of China(62001252);“High-precision” Discipline Construction Project in Beijing Universities(202100130401);Xidian University Integrated Business Network Theory and Key Technology State Key Laboratory Project(ISN22-13)

摘要:

目的:多层卫星网络是空天地一体化技术的重要组成,本文旨在依靠卫星节点的自主判决能力,发挥网络边缘场景中针对感知数据包含加解密和压缩在内的处理以及回传方面的任务协作能力。以确保数据安全为前提,以低传输时延为目标,实现任务卫星在多层卫星网络架构中的边缘决策。

方法:本文考虑了由低轨卫星、中轨卫星以及高轨地球同步轨道卫星组成的多层卫星网络。其中,低轨卫星节点负责观测侦察业务(如气象观测、地理侦测、情报侦察等),中轨卫星视为边缘场景中的雾节点,并由其中一颗担任雾运算处理中心,负责规划观测数据的压缩处理和安全加密所在卫星节点以及数据回传的网络选择,地球同步轨道卫星则具备最大的覆盖范围和最强的运算处理能力。本文使用深度强化学习算法实现卫星网络的边缘安全决策。具体来说,边缘中心节点通过感知系统获得卫星网络的环境状态,在此基础上利用深度强化学习算法自主学习的能力,拟合得到场景下最优的数据卸载策略,获得最优的链路规划,使得星上资源得到充分利用,从而达到众多观测任务的平均回传时延达最小的目的。首先,边缘中心节点对环境进行观察,获取环境中观测卫星任务数据量大小、信道条件、边缘节点处理能力等状态要素,在此基础上通过深度Q网络完成状态到动作的映射,实现初步的策略选择;策略作用于卫星网络,会改变环境的状态,同时环境对策略作出评价,以奖励的形式反馈给边缘中心节点;边缘中心节点基于新的环境状态和收益,进行误差计算和Q值的更新,以此来优化动作选择策略,从而获得更高的奖励收益以及新的环境状态;上述过程不断迭代最终获得最优策略。

结果:采用Keras作为仿真平台,并在仿真实验中,假定低轨卫星的星座为常见的Walker星座。以多层卫星网络中的某一区域作为仿真对象,设定该区域低轨观测卫星数量为8颗,中轨卫星数量为3颗,高轨卫星数量为一颗。本文的仿真结果包含三个方面:1)对不同卫星数量情况下各方法针对随机快照的收敛性能进行仿真。仿真结果表明,所提方法针对不同卫星数量的情况均表现出了收敛趋势,随着卫星数量的增加,所提方法达到收敛所需要的训练次数明显增加,这是由于卫星数量的增加大幅提升了方法动作空间的大小;2)对所提方法在不同网络构型条件下的性能进行了对比。仿真结果表明,所提方法在所有4种不同构型条件下均具有最好的收敛性能,然而在部分快照下,低-高网络构型的起始性能非常优异,但随着训练的进行,其收敛性能变得较差,这是由于该网络构型的链路选择较少,这限制了其性能;3)采用测试集对所提方法与对比方法的性能进行仿真验证。仿真结果表明,相较于随机边缘安全决策和由信噪比参数为导向的边缘安全决策,本文所提方法在时延性能上具有较大的优势,且与遍历得到的最优边缘安全决策性能相差较小。

结论:本文针对场景中为低轨观测卫星进行多层卫星节点的链路选择问题,提出一种基于深度强化学习的数据压缩与加密回传决策方法。通过结合场景合理地设计方法的状态、动作、奖励以及训练网络等相关参数,所提方法能够以低传输时延为目标进行智能高效的边缘决策。

关键词: 多层卫星网络, 低轨卫星, 边缘决策, 强化学习, 数据加密

Abstract:

Objectives: Multi-layer satellite network is an important component of space-ground integration technology.The purpose of this paper is to rely on the autonomous decision ability of satellite nodes to give full play to the processing and backhaul tasks of sensing data including encryption, decryption and compression in network edge scenarios. Collaboration. With the premise of ensuring data security and the goal of low transmission delay,the edge decision-making of mission satellites in the multi-layer satellite network architecture is realized.

Methods:This paper considers a multi-layer satellite network consisting of low-orbit satellites, medium-orbit satellites, and high-orbit geosynchronous satellites.Among them,the low-orbit satellite nodes are responsible for observation and reconnaissance services (such as meteorological observation, geographic detection, intelligence reconnaissance,etc.),and the medium-orbit satellites are regarded as fog nodes in edge scenarios, and one of them serves as the fog computing processing center, responsible for planning and observing The data compression processing and security encryption are located in the satellite node and the network selection of the data backhaul. The geosynchronous orbit satellite has the largest coverage and the strongest computing processing capability. This paper uses deep reinforcement learning algorithms to implement edge security decisions for satellite networks. Specifically, the edge center node obtains the environmental state of the satellite network through the perception system, and on this basis, uses the ability of deep reinforcement learning algorithm to learn independently, and obtains the optimal data offloading strategy in the scene by fitting, and obtains the optimal link planning., so that the onboard resources can be fully utilized, so as to achieve the goal of minimizing the average return delay of many observation tasks.First,the edge center node observes the environment and obtains state elements such as the data volume, channel conditions, and edge node processing capability of the observation satellite mission in the environment. Selection;the strategy acts on the satellite network,which will change the state of the environment,and the environment will evaluate the strategy and feed it back to the edge center node in the form of reward;the edge center node will perform error calculation and update the Q value based on the new environment state and income,in order to optimize the action selection strategy,so as to obtain higher rewards and new environmental states; the above process is continuously iterated to finally obtain the optimal strategy.

Results:Keras is used as the simulation platform,and in the simulation experiment,the constellation of low-orbit satellites is assumed to be the common Walker constellation. Taking a certain area in the multi-layer satellite network as the simulation object, the number of low-orbit observation satellites in this area is set to 8,the number of medium-orbit satellites is 3, and the number of high-orbit satellites is one. The simulation results include three aspects:1)Simulation of the convergence performance of each method for random snapshots with different numbers of satellites. The simulation results show that the proposed method shows a convergence trend for different numbers of satellites. With the increase of the number of satellites,the number of training times required for the proposed method to achieve convergence increases significantly. This is because the increase in the number of satellites increases significantly.The size of the action space of the method;2)The performance of the proposed method under different network configuration conditions is compared. Simulation results show that the proposed method has the best convergence performance under all 4 different configuration conditions,however,the initial performance of the low-high network configuration is excellent under partial snapshots,but as the training progresses, Its convergence performance becomes poor, because the network configuration has fewer link choices,which limits its performance; 3) The performance of the proposed method and the comparison method is simulated and verified by using the test set. The simulation results show that compared with the random edge security decision and the edge security decision oriented by the signal-to-noise ratio parameter,the method proposed has a greater advantage in the delay performance, and is comparable to the optimal edge security decision performance obtained by traversal.The difference is small.

Conclusions:Aiming at the link selection problem of multi-layer satellite nodes for low-orbit observation satellites in the scene,this paper proposes a data compression and encryption backhaul decision method based on deep reinforcement learning. By rationally designing the state, action, reward, and training network related parameters of the method in combination with the scene, the proposed method can make intelligent and efficient edge decision-making with the goal of low transmission delay.

Key words: multi-layer satellite network, LEO satellite, edge decision, reinforcement learning, data encryption

中图分类号: 

No Suggested Reading articles found!