物联网学报 ›› 2020, Vol. 4 ›› Issue (3): 42-51.doi: 10.11959/j.issn.2096-3750.2020.00177

• 专题:智慧交通物联网 • 上一篇    下一篇

基于深度强化学习的无人机数据采集和路径规划研究

牟治宇1,2,张煜1,2,范典3,刘君2,4,高飞飞1,2   

  1. 1 清华大学自动化系,北京 100084
    2 清华大学北京信息科学与技术国家研究中心,北京 100084
    3 中国信息通信研究院泰尔终端实验室,北京 100191
    4 清华大学网络科学与网络空间研究院,北京 100084
  • 修回日期:2020-07-02 出版日期:2020-09-30 发布日期:2020-09-07
  • 作者简介:牟治宇(1997- ),男,河北石家庄人,清华大学硕士生,主要研究方向为基于深度强化学习的无人机路径规划|张煜(1993- ),女,河南郑州人,清华大学博士生,主要研究方向为物联网通信理论、基于强化学习的无人机路径规划|范典(1992- ),男,山东菏泽人,中国信息通信研究院泰尔终端实验室战略规划与研究部工程师,主要研究方向为毫米波大规模多天线通信理论、阵列信号处理和无人机通信理论|刘君(1982- ),女,山东济南人,博士,清华大学助理研究员,主要研究方向为天地一体化网络、无人机组网|高飞飞(1980- ),男,陕西西安人,博士,清华大学副教授、博士生导师,主要研究方向为多天线通信和智能信号处理技术
  • 基金资助:
    国家重点研发计划(2018AAA0102401);中国信息通信研究院2020年青年课题;清华大学自主科研项目(2019Z08QCX19);国家自然科学基金资助项目(61902214);北京市自然科学基金资助项目(4182030);北京市自然科学基金资助项目(L182042)

Research on the UAV-aided data collection and trajectory design based on the deep reinforcement learning

Zhiyu MOU1,2,Yu ZHANG1,2,Dian FAN3,Jun LIU2,4,Feifei GAO1,2   

  1. 1 Department of Automation,Tsinghua University,Beijing 100084,China
    2 Beijing National Research Center for Information Science and Technology,Tsinghua University,Beijing 100084,China
    3 Department of Strategic Planning &Research of CTTL-Terminals,China Academy of Information Communications Technology,Beijing 100191,China
    4 Institute for Network Sciences and Cyberspace,Tsinghua University,Beijing 100084,China
  • Revised:2020-07-02 Online:2020-09-30 Published:2020-09-07
  • Supported by:
    The National Key R&D Program of China(2018AAA0102401);The China Academy of Information and Communications Technology Youth Project 2020;The Tsinghua University Independent Research Project(2019Z08QCX19);The National Natural Science Foundation of China(61902214);The Beijing Natural Science Foundation(4182030);The Beijing Natural Science Foundation(L182042)

摘要:

物联网时代需要实现海量的节点覆盖和连接,对于一些偏远地区,物联网通信技术存在无法及时采集数据的问题。而无人机具有灵活性和机动性等特点,因此,可用于物联网中的无线传感器网络的数据采集。所提方案着重对无人机辅助传感器网络数据采集时的路径规划问题进行了研究,同时满足无人机自身因电池容量有限而产生的充电需求。具体地,利用时间抽象分层强化学习思想,基于离散动作深度强化学习架构,提出了一种新颖的option-DQN(option-deep Q-learning)算法,实现了高效的无人机数据采集和路径规划,同时控制无人机及时进行充电,保证其正常飞行。仿真结果表明,相比于传统DQN(deep Q-learning)算法,所提算法在训练时的周期奖励上升速度更快,最终达到的周期奖励水平更高,并且无人机在执行任务时的轨迹更清晰、合理,所提算法可以判断无人机何时应进行充电,从而保证无人机的电量始终充足。

关键词: 无人机, 路径规划, 数据采集, 充电

Abstract:

The Internet of things (IoT) era needs to realize the wide coverage and connections for the IoT nodes.However,the IoT communication technology cannot collect data timely in the remote area.UAV has been widely used in the IoT wireless sensor network for the data collection due to its flexibility and mobility.The trajectory design of the UAV assisted sensor network data acquisition was discussed in the proposed scheme,as well as the UAV charging demand in the data collection process was met.Specifically,based on the hierarchical reinforcement learning with the temporal abstraction,a novel option-DQN (option-deep Q-learning) algorithm targeted for the discrete action was proposed to improve the performance of the data collection and trajectory design,and control the UAV to recharge in time to ensure its normal flight.The simulation results show that the training rewards and speed of the proposed method are much better than the conventional DQN (deep Q-learning) algorithm.Besides,the proposed algorithm can guarantee the sufficient power supply of UAV by controlling it to recharge timely.

Key words: UAV, trajectory design, data collection, charging

中图分类号: