基于深度强化学习的无人机数据采集和路径规划研究

doi:10.11959/j.issn.2096-3750.2020.00177

Abstract

Abstract:

The Internet of things (IoT) era needs to realize the wide coverage and connections for the IoT nodes.However,the IoT communication technology cannot collect data timely in the remote area.UAV has been widely used in the IoT wireless sensor network for the data collection due to its flexibility and mobility.The trajectory design of the UAV assisted sensor network data acquisition was discussed in the proposed scheme,as well as the UAV charging demand in the data collection process was met.Specifically,based on the hierarchical reinforcement learning with the temporal abstraction,a novel option-DQN (option-deep Q-learning) algorithm targeted for the discrete action was proposed to improve the performance of the data collection and trajectory design,and control the UAV to recharge in time to ensure its normal flight.The simulation results show that the training rewards and speed of the proposed method are much better than the conventional DQN (deep Q-learning) algorithm.Besides,the proposed algorithm can guarantee the sufficient power supply of UAV by controlling it to recharge timely.

Key words: UAV, trajectory design, data collection, charging

CLC Number:

TN92

Zhiyu MOU,Yu ZHANG,Dian FAN,Jun LIU,Feifei GAO. Research on the UAV-aided data collection and trajectory design based on the deep reinforcement learning[J]. Chinese Journal on Internet of Things, 2020, 4(3): 42-51.

Figures/Tables 8

References 24

[1]	ZHAO N , LU W D , SHENG M ,et al. UAV-assisted emergency networks in disasters[J]. IEEE Wireless Communications, 2019,26(1): 45-51.
[2]	CHENG F , ZHANG S , LI Z ,et al. UAV trajectory optimization for data offloading at the edge of multiple cells[J]. IEEE Transactions on Vehicular Technology, 2018,67(7): 6732-6736.
[3]	YOU C S , ZHANG R . 3D trajectory optimization in Rician fading for UAV-enabled data harvesting[J]. IEEE Transactions on Wireless Communications, 2019,18(6): 3192-3207.
[4]	ZHAN C , ZENG Y , ZHANG R . Energy-efficient data collection in UAV enabled wireless sensor network[J]. IEEE Wireless Communications Letters, 2018,7(3): 328-331.
[5]	SHAMSOSHOARA A , KHALEDI M , AFGHAH F ,et al. Distributed cooperative spectrum sharing in UAV networks using multi-agent reinforcement learning[C]// 2019 16th IEEE Annual Consumer Communications ＆ Networking Conference (CCNC). IEEE, 2019: 1-6.
[6]	YANG Q , JANG S J , YOO S J . Q-learning-based fuzzy logic for multi-objective routing algorithm in flying Ad Hoc networks[J]. Wireless Personal Communications, 2020,113(1): 115-138.
[7]	LIU X , LIU Y X , ZHANG N ,et al. Optimizing trajectory of unmanned aerial vehicles for efficient data acquisition:a matrix completion approach[J]. IEEE Internet of Things Journal, 2019,6(2): 1829-1840.
[8]	ZHANG J , ZENG Y , ZHANG R . Multi-antenna UAV data harvesting:joint trajectory and communication optimization[J]. Journal of Communications and Information Networks, 2020,5(1): 86-99.
[9]	ZHAN C , ZENG Y , ZHANG R . Trajectory design for distributed estimation in UAV-enabled wireless sensor network[J]. IEEE Transactions on Vehicular Technology, 2018,67(10): 10155-10159.
[10]	ALFATTANI S , JAAFAR W , YANIKOMEROGLU H ,et al. Multi-UAV data collection framework for wireless sensor networks[C]// 2019 IEEE Global Communications Conference (GLOBECOM). IEEE, 2019.
[11]	LI X W , YAO H P , WANG J J ,et al. Joint node assignment and trajectory optimization for rechargeable multi-UAV aided IoT systems[C]// 2019 11th International Conference on Wireless Communications and Signal Processing (WCSP). IEEE, 2019: 1-6.
[12]	ZHANG Y , LI B , GAO F F ,et al. A robust design for ultra reliable ambient backscatter communication systems[J]. IEEE Internet of Things Journal, 2019,6(5): 8989-8999.
[13]	CUI M , ZHANG G C , WU Q Q ,et al. Robust trajectory and transmit power design for secure UAV communications[J]. IEEE Transactions on Vehicular Technology, 2018,67(9): 9042-9046.
[14]	AL-HOURANI A , KANDEEPAN S , LARDNER S . Optimal LAP altitude for maximum coverage[J]. IEEE Wireless Communications Letters, 2014,3(6): 569-572.
[15]	SCHULMAN J , WOLSKI F , DHARIWAL P ,et al. Proximal policy optimization algorithms[J]. arXiv:1707.06347, 2017
[16]	SCHAUL T , QUAN J , ANTONOGLOU I ,et al. Prioritized experience replay[J]. arXiv:1511.05952, 2015
[17]	MNIH V , BADIA A P , MIRZA M ,et al. Asynchronous methods for deep reinforcement learning[C]// International Conference on Machine Learning. 2016: 1928-1937.
[18]	KULKARNI T D , NARASIMHAN K , SAEEDI A ,et al. Hierarchical deep reinforcement learning:integrating temporal abstraction and intrinsic motivation[C]// Advances in Neural Information Processing Systems. 2016: 3675-3683.
[19]	丁瑞金, 高飞飞, 邢玲 . 基于深度强化学习的物联网智能路由策略[J]. 物联网学报, 2019,3(2): 56-63.
	DING R J , GAO F F , XING L . Intelligent routing strategy in the Internet of things based on deep reinforcement learning[J]. Chinese Journal on Internet of Things, 2019,3(2): 56-63.
[20]	MNIH V , KAVUKCUOGLU K , SILVER D ,et al. Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540): 529-533.
[21]	SUTTON R S , PRECUP D , SINGH S . Between MDPs and semi-MDPs:a framework for temporal abstraction in reinforcement learning[J]. Artificial Intelligence, 1999,112(1-2): 181-211.
[22]	蒋昂波, 王维维 . ReLU 激活函数优化研究[J]. 传感器与微系统, 2018,37(2): 50-52.
	JIANG A B , WANG W W . Research on optimization of ReLU activation function[J]. Transducer and Microsystem Technology, 2018,37(2): 50-52.
[23]	TOKIC M , PALM G . Value-difference based exploration:adaptive control between epsilon-greedy and softmax[C]// Annual Conference on Artificial Intelligence. Springer, 2011: 335-346.
[24]	BOR-YALINIZ R I , EL-KEYI A , YANIKOMEROGLU H . Efficient 3-D placement of an aerial base station in next generation cellular networks[C]// 2016 IEEE International Conference on Communications (ICC). IEEE, 2016: 1-5.

Metrics

Recommended 0

No Suggested Reading articles found!

参数	描述	值
f_c	载波频率	2.5 GHz
(a,b,η_LoS,η_NLoS)	郊区环境	(4.88,0.43,0.1,21)
	城市环境	(9.61,0.16,1,20)
	密集城市环境	(12.08,0.11,1.6,23)
	高度密集城市环境	(27.23,0.08,2.3,34)
γ	奖励折扣因子	0.95

指标	option-DQN算法	DQN算法
飞行总时长/s	162	519
采集节点数	20	15
是否采集完所有节点	是	否
充电次数/次	1	3
途中是否存在低电量状态	是	否
途中是否存在负电量状态	否	否

Research on the UAV-aided data collection and trajectory design based on the deep reinforcement learning

RichHTML

PDF下载

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 8

References 24

Related Articles 11

Metrics

Recommended 0

[1]	Zhihong WANG, Supeng LENG, Kai XIONG. Multi-agent resource allocation strategy for UAV swarm-based cooperative sensing [J]. Chinese Journal on Internet of Things, 2023, 7(1): 18-26.
[2]	Qianwen LI, Jianfeng CHEN, Miao CUI, Guangchi ZHANG. Trajectory and communication scheduling optimization for the rechargeable UAV aided data collection system [J]. Chinese Journal on Internet of Things, 2022, 6(3): 113-123.
[3]	Wei WANG, Renqian GU, Li3 PENG, Jijun ZHAO, Zhongcheng WEI, Cunxi CHANG. Robust optimization of air based relay for internet of things based on UAV [J]. Chinese Journal on Internet of Things, 2022, 6(1): 101-112.
[4]	Haibo MEI, Kun YANG, Xinyu FAN. Deep reinforcement learning to enhance the energy-efficient performance of UAV-enabled F-RAN [J]. Chinese Journal on Internet of Things, 2021, 5(2): 48-59.
[5]	Chunmin LIN, Liekang ZENG, Xu CHEN. Research on power efficient autonomous UAV navigation algorithm: an edge intelligence driven approach [J]. Chinese Journal on Internet of Things, 2021, 5(2): 87-96.
[6]	Jiequ JI, Kun ZHU, Changyan YI, Ran WANG. Joint task offloading and trajectory optimization for multi-UAV assisted mobile edge computing [J]. Chinese Journal on Internet of Things, 2021, 5(1): 27-35.
[7]	Yalin LIU,Hongning DAI,Qubeijian WANG. Unmanned aerial vehicle enabled communication technologies and applications for Internet of things [J]. Chinese Journal on Internet of Things, 2019, 3(4): 48-55.
[8]	Shuyun LUO,Yuzhou WEN,Weiqiang XU,Shenghong ZHU. Research on incentive mechanism for mobile intelligent edge computing [J]. Chinese Journal on Internet of Things, 2019, 3(2): 80-88.
[9]	Qingheng SONG,Fuchun ZHENG. Potential and methods of wireless communications for Internet of things based on UAV [J]. Chinese Journal on Internet of Things, 2019, 3(1): 82-89.
[10]	Zheng LI,Sheng ZHANG,Luyu ZHANG,Yang ZHANG. Bridge monitoring system based on Internet of things technology [J]. Chinese Journal on Internet of Things, 2018, 2(3): 104-110.
[11]	Fang GUO,Yu-yan ZHANG,Long ZHAO,Kan ZHENG,Wen-bo WANG. Performance optimization for UAV-enabled V2I communications system [J]. Chinese Journal on Internet of Things, 2017, 1(2): 46-53.