基于深度增强学习和多目标优化改进的卫星资源分配算法

doi:10.11959/j.issn.1000-436x.2020117

Abstract

Abstract:

In view of the multi-objective optimization (MOP) problem of sequential decision-making for resource allocations in multi-beam satellite systems,a deep reinforcement learning(DRL) based DRL-MOP algorithm was proposed to improve the system performance and user satisfaction degree.With considering the normalized weighted sum of spectrum efficiency,energy efficiency,and satisfaction index as the optimization goal,the dynamically changing system environments and user arrival model were built by the proposed algorithm,and the optimization of the accumulative performance in satellite systems based on DRL and MOP was realized.Simulation results show that the proposed algorithm can solve the MOP problem with rapid convergence ability and low complexity,and it is obviously superior to other algorithms in terms of system performance and user satisfaction optimization.

Key words: multi-beam satellite system, resource allocation, sequential decision-making, deep reinforcement learning, multi-objective optimization

CLC Number:

TN927+.2

Pei ZHANG,Shuaijun LIU,Zhiguo MA,Xiaohui WANG,Junde SONG. Improved satellite resource allocation algorithm based on DRL and MOP[J]. Journal on Communications, 2020, 41(6): 51-60.

Figures/Tables 9

References 20

[1]	易克初, 李怡, 孙晨华 ,等. 卫星通信的近期发展与前景展望[J]. 通信学报, 2015,36(6): 161-176.
	YI K C , LI Y , SUN C H ,et al. Recent development and its prospect of satellite communications[J]. Journal on Communications, 2015,36(6): 161-176.
[2]	WANG C , CUI G , WANG W ,et al. Joint estimation of carrier frequency and phase offset based on pilot symbols in quasi-constant envelope OFDM satellite systems[J]. China Communications, 2017,14(7): 1-11.
[3]	史煜, 张邦宁, 郭道省 ,等. 一种改进的多波束卫星通信系统功率分配算法[J]. 通信技术, 2016,49(10): 1355-1359.
	SHI Y , ZHANG B N , GUO D X ,et al. A modified water-filling algorithm of power allocation for multi-beam satellite systems[J]. Communications Technology, 2016,49(10): 1355-1359.
[4]	ARTIGA X , NUNEZ-MARTINEZ J, A , PEREZ-NEIRA A ,et al. Terrestrial-satellite integration in dynamic 5G backhaul networks[C]// The 8th Advanced Satellite Multimedia Systems Conference and the 14th Signal Processing for Space Communications Workshop. Piscataway:IEEE Press, 2016: 1-6.
[5]	阚茜, 许小东 . 一种能量和频谱效率兼顾的多波束卫星系统功率分配策略[J]. 中国科学技术大学学报, 2016,46(2): 138-147.
	KAN X , XU X D . Power allocation based on energy and spectral efficiency in multi-beam satellite systems[J]. Journal of University of Science and Technology of China, 2016,46(2): 138-147.
[6]	阚茜 . 衰落信道下波束卫星系统功率分配策略研究[D]. 合肥:中国科学技术大学, 2016.
	KAN X . Power allocation of multi—beam satellite system in fading channel[D]. Hefei:University of Science and Technology of China, 2016.
[7]	COCCO G , DE COLA T , ANGELONE M ,et al. Radio resource management optimization of flexible satellite payloads for DVB-S2 systems[J]. IEEE Transactions on Broadcasting, 2018,64(2): 266-280.
[8]	ARAVANIS A I , SHANKAR M R B , ARAPOGLOU P ,et al. Power allocation in multibeam satellite systems:a two-stage multi-objective optimization[J]. IEEE Transactions on Wireless Communications, 2015,14(6): 3171-3182.
[9]	ZHANG P , WANG X , MA Z ,et al. Joint optimization of satisfaction index and spectrum efficiency with cache restricted for resource allocation in multi-beam satellite systems[J]. China Communications, 2019,16(2): 189-201.
[10]	廖晓闽, 严少虎, 石嘉 ,等. 基于深度强化学习的蜂窝网资源分配算法[J]. 通信学报, 2019,40(2): 11-18.
	LIAO X M , YAN S H , SHI J ,et al. Deep reinforcement learning based resource allocation algorithm in cellular networks[J]. Journal on Communications, 2019,40(2): 11-18.
[11]	HAN Z , LEI T , LU Z ,et al. Artificial intelligence based handoff management for dense WLANs:a deep reinforcement learning approach[J]. IEEE Access, 2019,7: 31688-31701.
[12]	FAN H , ZHU L , YAO C ,et al. Deep reinforcement learning for energy efficiency optimization in wireless networks[C]// The 4th International Conference on Cloud Computing and Big Data Analysis. Piscataway:IEEE Press, 2019: 465-471.
[13]	FERREIRA P V R , PAFFENROTH R , WYGLINSKI A M ,et al. Multi-objective reinforcement learning for cognitive satellite communications using deep neural network ensembles[J]. IEEE Journal on Selected Areas in Communications, 2018,36: 1030-1041.
[14]	HU X , LIU S , WANG Y ,et al. Deep reinforcement learning based beam hopping algorithm in multibeam satellite systems[J]. IET Communications, 2019,13(16): 2485-2491.
[15]	HU X , ZHANG Y , LIAO X ,et al. Dynamic beam hopping method based on multi-objective deep reinforcement learning for next generation satellite broadband systems[J]. IEEE Transactions on Broadcasting, 2019,doi:10.1109/TBC.2019.2960940.
[16]	HU X , LIU S , CHEN R ,et al. A deep reinforcement learning-based framework for dynamic resource allocation in multibeam satellite systems[J]. IEEE Communications Letters, 2018,22(8): 1612-1615.
[17]	LIU S , HU X , WANG W . Deep reinforcement learning based dynamic channel allocation algorithm in multibeam satellite systems[J]. IEEE Access, 2018,6: 15733-15742.
[18]	刘帅军 . 卫星通信系统中动态资源管理技术研究[D]. 北京:北京邮电大学, 2018.
	LIU S J . The research on dynamic resource management techniques for satellite communication systems[D]. Beijing:Beijing University of Posts and Telecommunications, 2018.
[19]	彭伟 . 揭秘深度强化学习[M]. 北京: 中国水利水电出版社, 2018: 266-291.
	PENG W . Exploring deep reinforcement learning[M]. Beijing: China Water ＆ Power PressPress, 2018: 266-291.
[20]	ETSI. GEO-mobile radio interface specifications (Release 1):V1.3.1[S].TS.101 376-5-5,(2005-02-11)[2019-12-20].

Metrics

Recommended 0

No Suggested Reading articles found!

仿真参数	取值
下行工作频率/ MHz	1 542
波束个数/个	37
系统带宽/MHz	5
载波带宽/kHz	312.5
载波个数/个	16
最大天线增益/dBi	41.6
用户终端EIRP/dBW	7、11
接收天线增益G与接收系统噪声温度T之比/(dB/K)	-24、-22
用户业务到达率/(次.分钟^-1)	70
卷积核K₁	7
卷积核K₂	2
卷积层1 H₁	3
卷积层2 H₂	2
卷积层1 C₀ ×C₁	1×16
卷积层2 C₁ ×C₂	16×32
全连接层1 X₁ ×Y₁	128×128
全连接层2 X₂ ×Y₂	128×16
采样批量	4
折扣因子	0.9
更新步长	50
学习率	0.001
初始探索概率	1
最终探索概率	0.01

网络层	输入	卷积核	偏置	权值	输出
卷积层1	9×9，M_c+1	7×7，16	3×3，16	—	3×3，16
卷积层2	3×3，16	2×2，32	2×2，32	—	2×2，32
全连接层1	128	—	128×1	128×128	128
全连接层2	128	—	16×1	16×128	M_c

Improved satellite resource allocation algorithm based on DRL and MOP

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 9

References 20

Related Articles 15

Metrics

Recommended 0

[1]	Li WANG, Aiguo FEI, Ping ZHANG, Lianming XU. Research on new frameworks and key technologies for intelligent emergency command communication networks [J]. Journal on Communications, 2023, 44(6): 1-11.
[2]	Biao JIN, Yikang LI, Zhiqiang YAO, Yulin CHEN, Jinbo XIONG. GenFedRL: a general federated reinforcement learning framework for deep reinforcement learning agents [J]. Journal on Communications, 2023, 44(6): 183-197.
[3]	Yuancheng LI, Yongtai QIN. Deep reinforcement learning based algorithm for real-time QoS optimization of software-defined security middle platform [J]. Journal on Communications, 2023, 44(5): 181-192.
[4]	Zaijian WANG, Huimin GU. Network slicing resource allocation strategy based on joint optimization [J]. Journal on Communications, 2023, 44(5): 234-245.
[5]	Guoliang XU, Feng TAN, Yongyi RAN, Feng CHEN. Joint beam hopping and coverage control optimization algorithm for multibeam satellite system [J]. Journal on Communications, 2023, 44(4): 78-86.
[6]	Xueyong YU, Lixiang QIU, Jianing SONG, Hongbo ZHU. Security communication and energy efficiency optimization strategy in UAV-aided edge computing [J]. Journal on Communications, 2023, 44(3): 45-54.
[7]	Guojun LI, Xu HOU, Changrong YE, Yiping LUO. Wide area cooperative resource allocation algorithm for shortwave communication access network [J]. Journal on Communications, 2023, 44(2): 112-121.
[8]	Long LONG, Zichen LIU, Zaiwang LU, Yucheng ZHANG, Lei LI. Joint optimization strategy of service cache and resource allocation in mobile edge network [J]. Journal on Communications, 2023, 44(1): 64-74.
[9]	Xiaorong ZHU, Kang CHEN. Research on elastic handover algorithm of 6G network based on fine-grained slicing [J]. Journal on Communications, 2022, 43(9): 148-156.
[10]	Zongxuan SHA, Ru HUO, Chuang SUN, Shuo WANG, Tao HUANG. Forwarding efficiency aware traffic scheduling algorithm based on deep reinforcement learning [J]. Journal on Communications, 2022, 43(8): 30-40.
[11]	Shaoshuai FAN, Jianbo WU, Hui TIAN. Federated learning resource management for energy-constrained industrial IoT devices [J]. Journal on Communications, 2022, 43(8): 65-77.
[12]	Yu ZHANG, Min CHENG. Joint optimization of edge computing and caching in NDN [J]. Journal on Communications, 2022, 43(8): 164-175.
[13]	Li WANG, Qing WEI, Lianming XU, Yuan SHEN, Ping ZHANG, Aiguo FEI. Research on low-energy-consumption deployment of emergency UAV network for integrated communication-navigating-sensing [J]. Journal on Communications, 2022, 43(7): 1-20.
[14]	Peng QIN, Haoting HE, Xiongwen ZHAO, Yang FU, Yu ZHANG, Miao WANG, Shuo WANG, Xue WU. Efficient resource allocation with context-awareness for parked car road side unit-based Internet of vehicles [J]. Journal on Communications, 2022, 43(7): 113-125.
[15]	Damin ZHANG, Yi WANG, Chengcheng ZOU, Peiwen ZHAO, Linna ZHANG. Resource allocation strategies for improved mayfly algorithm in cognitive heterogeneous cellular network [J]. Journal on Communications, 2022, 43(6): 156-167.