通信学报 ›› 2018, Vol. 39 ›› Issue (4): 35-44.doi: 10.11959/j.issn.1000-436x.2018058

• 学术论文 • 上一篇    下一篇

无线网络中基于深度Q学习的传输调度方案

朱江,王婷婷(),宋永辉,刘亚利   

  1. 重庆邮电大学移动通信技术重点实验室,重庆 400065
  • 修回日期:2018-03-16 出版日期:2018-04-01 发布日期:2018-04-29
  • 作者简介:朱江(1977-),男,湖北荆州人,博士,重庆邮电大学教授,主要研究方向为认知无线电、移动通信、网络安全态势感知。|王婷婷(1993-),女,安徽安庆人,重庆邮电大学硕士生,主要研究方向为网络安全态势感知。|宋永辉(1991-),男,河北邯郸人,重庆邮电大学硕士生,主要研究方向为认知无线电。|刘亚利(1990-),男,河南商丘人,重庆邮电大学硕士生,主要研究方向为认知无线电。
  • 基金资助:
    国家自然科学基金资助项目(61102062);国家自然科学基金资助项目(61271260);国家自然科学基金资助项目(61301122);重庆市基础与前沿研究计划基金资助项目(cstc2015jcyjA40050)

Transmission scheduling scheme based on deep Q learning in wireless network

Jiang ZHU,Tingting WANG(),Yonghui SONG,Yali LIU   

  1. Key Laboratory of Information and Communication Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065
  • Revised:2018-03-16 Online:2018-04-01 Published:2018-04-29
  • Supported by:
    The National Natural Science Foundation of China(61102062);The National Natural Science Foundation of China(61271260);The National Natural Science Foundation of China(61301122);Chongqing Research Program of Basic Research and Frontier Technology(cstc2015jcyjA40050)

摘要:

针对无线网络中的数据传输问题,提出一种基于深度Q学习(QL,Q learning)的传输调度方案。该方案通过建立马尔可夫决策过程(MDP,Markov decision process)系统模型来描述系统的状态转移情况;使用Q学习算法在系统状态转移概率未知的情况下学习和探索系统的状态转移信息,以获取调度节点的近似最优策略。另外,当系统状态的规模较大时,采用深度学习(DL,deep learning)的方法来建立状态和行为之间的映射关系,以避免策略求解中产生的较大计算量和存储空间。仿真结果表明,该方法在功耗、吞吐量、分组丢失率方面的性能逼近基于策略迭代的最优策略,且算法复杂度较低,解决了维灾问题。

关键词: 无线网络传输, 马尔可夫决策过程, Q学习, 深度学习

Abstract:

To cope with the problem of data transmission in wireless networks,a deep Q learning based transmission scheduling scheme was proposed.The Markov decision process system model was formulated to describe the state transition of the system.The Q learning algorithm was adopted to learn and explore the system states transition information in the case of unknown system states transition probability to obtain the approximate optimal strategy of the schedule node.In addition,when the system state scale was big,the deep learning method was employed to map the relation between state and behavior to solve the problem of the large amount of computation and storage space in Q learning process.The simulation results show that the proposed scheme can approach the optimal strategy based on strategy iteration in terms of power consumption,throughput,packets loss rate.And the proposed scheme has a lower complexity,which can solve the problem of the curse of dimensionality.

Key words: wireless network transmission, Markov decision process, Q learning, deep learning

中图分类号: 

No Suggested Reading articles found!