Chinese Journal on Internet of Things ›› 2020, Vol. 4 ›› Issue (3): 42-51.doi: 10.11959/j.issn.2096-3750.2020.00177

• Topic:IoT in Intelligent Transportation • Previous Articles     Next Articles

Research on the UAV-aided data collection and trajectory design based on the deep reinforcement learning

Zhiyu MOU1,2,Yu ZHANG1,2,Dian FAN3,Jun LIU2,4,Feifei GAO1,2   

  1. 1 Department of Automation,Tsinghua University,Beijing 100084,China
    2 Beijing National Research Center for Information Science and Technology,Tsinghua University,Beijing 100084,China
    3 Department of Strategic Planning &Research of CTTL-Terminals,China Academy of Information Communications Technology,Beijing 100191,China
    4 Institute for Network Sciences and Cyberspace,Tsinghua University,Beijing 100084,China
  • Revised:2020-07-02 Online:2020-09-30 Published:2020-09-07
  • Supported by:
    The National Key R&D Program of China(2018AAA0102401);The China Academy of Information and Communications Technology Youth Project 2020;The Tsinghua University Independent Research Project(2019Z08QCX19);The National Natural Science Foundation of China(61902214);The Beijing Natural Science Foundation(4182030);The Beijing Natural Science Foundation(L182042)


The Internet of things (IoT) era needs to realize the wide coverage and connections for the IoT nodes.However,the IoT communication technology cannot collect data timely in the remote area.UAV has been widely used in the IoT wireless sensor network for the data collection due to its flexibility and mobility.The trajectory design of the UAV assisted sensor network data acquisition was discussed in the proposed scheme,as well as the UAV charging demand in the data collection process was met.Specifically,based on the hierarchical reinforcement learning with the temporal abstraction,a novel option-DQN (option-deep Q-learning) algorithm targeted for the discrete action was proposed to improve the performance of the data collection and trajectory design,and control the UAV to recharge in time to ensure its normal flight.The simulation results show that the training rewards and speed of the proposed method are much better than the conventional DQN (deep Q-learning) algorithm.Besides,the proposed algorithm can guarantee the sufficient power supply of UAV by controlling it to recharge timely.

Key words: UAV, trajectory design, data collection, charging

CLC Number: 

No Suggested Reading articles found!