基于深度强化学习的无人机数据采集和路径规划研究

doi:10.11959/j.issn.2096-3750.2020.00177

物联网学报 ›› 2020, Vol. 4 ›› Issue (3): 42-51.doi: 10.11959/j.issn.2096-3750.2020.00177

• 专题：智慧交通物联网 • 上一篇下一篇

基于深度强化学习的无人机数据采集和路径规划研究

牟治宇^1,²,张煜^1,²,范典³,刘君^2,⁴,高飞飞^1,²

¹ 清华大学自动化系，北京 100084
² 清华大学北京信息科学与技术国家研究中心，北京 100084
³ 中国信息通信研究院泰尔终端实验室，北京 100191
⁴ 清华大学网络科学与网络空间研究院，北京 100084

修回日期:2020-07-02 出版日期:2020-09-30 发布日期:2020-09-07
作者简介:牟治宇（1997- ），男，河北石家庄人，清华大学硕士生，主要研究方向为基于深度强化学习的无人机路径规划|张煜（1993- ），女，河南郑州人，清华大学博士生，主要研究方向为物联网通信理论、基于强化学习的无人机路径规划|范典（1992- ），男，山东菏泽人，中国信息通信研究院泰尔终端实验室战略规划与研究部工程师，主要研究方向为毫米波大规模多天线通信理论、阵列信号处理和无人机通信理论|刘君（1982- ），女，山东济南人，博士，清华大学助理研究员，主要研究方向为天地一体化网络、无人机组网|高飞飞（1980- ），男，陕西西安人，博士，清华大学副教授、博士生导师，主要研究方向为多天线通信和智能信号处理技术
基金资助:
国家重点研发计划(2018AAA0102401);中国信息通信研究院2020年青年课题;清华大学自主科研项目(2019Z08QCX19);国家自然科学基金资助项目(61902214);北京市自然科学基金资助项目(4182030);北京市自然科学基金资助项目(L182042)

Research on the UAV-aided data collection and trajectory design based on the deep reinforcement learning

Zhiyu MOU^1,²,Yu ZHANG^1,²,Dian FAN³,Jun LIU^2,⁴,Feifei GAO^1,²

¹ Department of Automation,Tsinghua University,Beijing 100084,China
² Beijing National Research Center for Information Science and Technology,Tsinghua University,Beijing 100084,China
³ Department of Strategic Planning ＆Research of CTTL-Terminals,China Academy of Information Communications Technology,Beijing 100191,China
⁴ Institute for Network Sciences and Cyberspace,Tsinghua University,Beijing 100084,China

Revised:2020-07-02 Online:2020-09-30 Published:2020-09-07
Supported by:
The National Key R＆D Program of China(2018AAA0102401);The China Academy of Information and Communications Technology Youth Project 2020;The Tsinghua University Independent Research Project(2019Z08QCX19);The National Natural Science Foundation of China(61902214);The Beijing Natural Science Foundation(4182030);The Beijing Natural Science Foundation(L182042)

摘要/Abstract

摘要：

物联网时代需要实现海量的节点覆盖和连接，对于一些偏远地区，物联网通信技术存在无法及时采集数据的问题。而无人机具有灵活性和机动性等特点，因此，可用于物联网中的无线传感器网络的数据采集。所提方案着重对无人机辅助传感器网络数据采集时的路径规划问题进行了研究，同时满足无人机自身因电池容量有限而产生的充电需求。具体地，利用时间抽象分层强化学习思想，基于离散动作深度强化学习架构，提出了一种新颖的option-DQN（option-deep Q-learning）算法，实现了高效的无人机数据采集和路径规划，同时控制无人机及时进行充电，保证其正常飞行。仿真结果表明，相比于传统DQN（deep Q-learning）算法，所提算法在训练时的周期奖励上升速度更快，最终达到的周期奖励水平更高，并且无人机在执行任务时的轨迹更清晰、合理，所提算法可以判断无人机何时应进行充电，从而保证无人机的电量始终充足。

关键词: 无人机, 路径规划, 数据采集, 充电

Abstract:

The Internet of things (IoT) era needs to realize the wide coverage and connections for the IoT nodes.However,the IoT communication technology cannot collect data timely in the remote area.UAV has been widely used in the IoT wireless sensor network for the data collection due to its flexibility and mobility.The trajectory design of the UAV assisted sensor network data acquisition was discussed in the proposed scheme,as well as the UAV charging demand in the data collection process was met.Specifically,based on the hierarchical reinforcement learning with the temporal abstraction,a novel option-DQN (option-deep Q-learning) algorithm targeted for the discrete action was proposed to improve the performance of the data collection and trajectory design,and control the UAV to recharge in time to ensure its normal flight.The simulation results show that the training rewards and speed of the proposed method are much better than the conventional DQN (deep Q-learning) algorithm.Besides,the proposed algorithm can guarantee the sufficient power supply of UAV by controlling it to recharge timely.

Key words: UAV, trajectory design, data collection, charging

中图分类号:

TN92

牟治宇,张煜,范典,刘君,高飞飞. 基于深度强化学习的无人机数据采集和路径规划研究[J]. 物联网学报, 2020, 4(3): 42-51.

Zhiyu MOU,Yu ZHANG,Dian FAN,Jun LIU,Feifei GAO. Research on the UAV-aided data collection and trajectory design based on the deep reinforcement learning[J]. Chinese Journal on Internet of Things, 2020, 4(3): 42-51.

图/表 8

图1

图2

图3

图4

表1

图5

图6

表2

参考文献 24

[1]	ZHAO N , LU W D , SHENG M ,et al. UAV-assisted emergency networks in disasters[J]. IEEE Wireless Communications, 2019,26(1): 45-51.
[2]	CHENG F , ZHANG S , LI Z ,et al. UAV trajectory optimization for data offloading at the edge of multiple cells[J]. IEEE Transactions on Vehicular Technology, 2018,67(7): 6732-6736.
[3]	YOU C S , ZHANG R . 3D trajectory optimization in Rician fading for UAV-enabled data harvesting[J]. IEEE Transactions on Wireless Communications, 2019,18(6): 3192-3207.
[4]	ZHAN C , ZENG Y , ZHANG R . Energy-efficient data collection in UAV enabled wireless sensor network[J]. IEEE Wireless Communications Letters, 2018,7(3): 328-331.
[5]	SHAMSOSHOARA A , KHALEDI M , AFGHAH F ,et al. Distributed cooperative spectrum sharing in UAV networks using multi-agent reinforcement learning[C]// 2019 16th IEEE Annual Consumer Communications ＆ Networking Conference (CCNC). IEEE, 2019: 1-6.
[6]	YANG Q , JANG S J , YOO S J . Q-learning-based fuzzy logic for multi-objective routing algorithm in flying Ad Hoc networks[J]. Wireless Personal Communications, 2020,113(1): 115-138.
[7]	LIU X , LIU Y X , ZHANG N ,et al. Optimizing trajectory of unmanned aerial vehicles for efficient data acquisition:a matrix completion approach[J]. IEEE Internet of Things Journal, 2019,6(2): 1829-1840.
[8]	ZHANG J , ZENG Y , ZHANG R . Multi-antenna UAV data harvesting:joint trajectory and communication optimization[J]. Journal of Communications and Information Networks, 2020,5(1): 86-99.
[9]	ZHAN C , ZENG Y , ZHANG R . Trajectory design for distributed estimation in UAV-enabled wireless sensor network[J]. IEEE Transactions on Vehicular Technology, 2018,67(10): 10155-10159.
[10]	ALFATTANI S , JAAFAR W , YANIKOMEROGLU H ,et al. Multi-UAV data collection framework for wireless sensor networks[C]// 2019 IEEE Global Communications Conference (GLOBECOM). IEEE, 2019.
[11]	LI X W , YAO H P , WANG J J ,et al. Joint node assignment and trajectory optimization for rechargeable multi-UAV aided IoT systems[C]// 2019 11th International Conference on Wireless Communications and Signal Processing (WCSP). IEEE, 2019: 1-6.
[12]	ZHANG Y , LI B , GAO F F ,et al. A robust design for ultra reliable ambient backscatter communication systems[J]. IEEE Internet of Things Journal, 2019,6(5): 8989-8999.
[13]	CUI M , ZHANG G C , WU Q Q ,et al. Robust trajectory and transmit power design for secure UAV communications[J]. IEEE Transactions on Vehicular Technology, 2018,67(9): 9042-9046.
[14]	AL-HOURANI A , KANDEEPAN S , LARDNER S . Optimal LAP altitude for maximum coverage[J]. IEEE Wireless Communications Letters, 2014,3(6): 569-572.
[15]	SCHULMAN J , WOLSKI F , DHARIWAL P ,et al. Proximal policy optimization algorithms[J]. arXiv:1707.06347, 2017
[16]	SCHAUL T , QUAN J , ANTONOGLOU I ,et al. Prioritized experience replay[J]. arXiv:1511.05952, 2015
[17]	MNIH V , BADIA A P , MIRZA M ,et al. Asynchronous methods for deep reinforcement learning[C]// International Conference on Machine Learning. 2016: 1928-1937.
[18]	KULKARNI T D , NARASIMHAN K , SAEEDI A ,et al. Hierarchical deep reinforcement learning:integrating temporal abstraction and intrinsic motivation[C]// Advances in Neural Information Processing Systems. 2016: 3675-3683.
[19]	丁瑞金, 高飞飞, 邢玲 . 基于深度强化学习的物联网智能路由策略[J]. 物联网学报, 2019,3(2): 56-63.
	DING R J , GAO F F , XING L . Intelligent routing strategy in the Internet of things based on deep reinforcement learning[J]. Chinese Journal on Internet of Things, 2019,3(2): 56-63.
[20]	MNIH V , KAVUKCUOGLU K , SILVER D ,et al. Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540): 529-533.
[21]	SUTTON R S , PRECUP D , SINGH S . Between MDPs and semi-MDPs:a framework for temporal abstraction in reinforcement learning[J]. Artificial Intelligence, 1999,112(1-2): 181-211.
[22]	蒋昂波, 王维维 . ReLU 激活函数优化研究[J]. 传感器与微系统, 2018,37(2): 50-52.
	JIANG A B , WANG W W . Research on optimization of ReLU activation function[J]. Transducer and Microsystem Technology, 2018,37(2): 50-52.
[23]	TOKIC M , PALM G . Value-difference based exploration:adaptive control between epsilon-greedy and softmax[C]// Annual Conference on Artificial Intelligence. Springer, 2011: 335-346.
[24]	BOR-YALINIZ R I , EL-KEYI A , YANIKOMEROGLU H . Efficient 3-D placement of an aerial base station in next generation cellular networks[C]// 2016 IEEE International Conference on Communications (ICC). IEEE, 2016: 1-5.

参数	描述	值
f_c	载波频率	2.5 GHz
(a,b,η_LoS,η_NLoS)	郊区环境	(4.88,0.43,0.1,21)
	城市环境	(9.61,0.16,1,20)
	密集城市环境	(12.08,0.11,1.6,23)
	高度密集城市环境	(27.23,0.08,2.3,34)
γ	奖励折扣因子	0.95

指标	option-DQN算法	DQN算法
飞行总时长/s	162	519
采集节点数	20	15
是否采集完所有节点	是	否
充电次数/次	1	3
途中是否存在低电量状态	是	否
途中是否存在负电量状态	否	否

基于深度强化学习的无人机数据采集和路径规划研究

Research on the UAV-aided data collection and trajectory design based on the deep reinforcement learning

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 24

相关文章 13

Metrics

推荐阅读 0

[1]	王志宏, 冷甦鹏, 熊凯. 面向无人机集群协同感知的多智能体资源分配策略[J]. 物联网学报, 2023, 7(1): 18-26.
[2]	俞汉清, 林艳, 贾林琼, 李强, 张一晋. 面向多目标救援的通信受限无人机集群分布式策略[J]. 物联网学报, 2022, 6(3): 103-112.
[3]	李茜雯, 陈健锋, 崔苗, 张广驰. 可充电无人机辅助数据采集系统的飞行路线与通信调度优化[J]. 物联网学报, 2022, 6(3): 113-123.
[4]	王巍, 谷壬倩, 彭力, 赵继军, 魏忠诚, 常存喜. 基于无人机的物联网空基中继鲁棒优化[J]. 物联网学报, 2022, 6(1): 101-112.
[5]	王巍, 梁雅静, 彭力, 魏忠诚, 赵继军. 设备接入受限的UAV空基应急物联网节点分簇部署研究[J]. 物联网学报, 2021, 5(3): 97-105.
[6]	梅海波, 杨鲲, 范新宇. 基于深度增强学习的无人机赋能雾无线电接入网络的能效优化[J]. 物联网学报, 2021, 5(2): 48-59.
[7]	林椿珉, 曾烈康, 陈旭. 边缘智能驱动的高能效无人机自主导航算法研究[J]. 物联网学报, 2021, 5(2): 87-96.
[8]	嵇介曲, 朱琨, 易畅言, 王然. 多无人机辅助移动边缘计算中的任务卸载和轨迹优化[J]. 物联网学报, 2021, 5(1): 27-35.
[9]	刘亚林,戴弘宁,王曲北剑. 无人机辅助的物联网通信技术及其应用[J]. 物联网学报, 2019, 3(4): 48-55.
[10]	周毅,马晓勇,郜富晓,李伟,承楠,路宁. 基于深度强化学习的无人机自主部署及能效优化策略[J]. 物联网学报, 2019, 3(2): 47-55.
[11]	宋庆恒,郑福春. 基于无人机的物联网无线通信的潜力与方法[J]. 物联网学报, 2019, 3(1): 82-89.
[12]	李政,张圣,张卢喻,张杨. 基于物联网技术的桥梁监测系统[J]. 物联网学报, 2018, 2(3): 104-110.
[13]	郭芳,张玉艳,赵龙,郑侃,王文博. 基于无人机辅助的V2I无线传输系统性能优化[J]. 物联网学报, 2017, 1(2): 46-53.