无线网络中基于深度Q学习的传输调度方案

doi:10.11959/j.issn.1000-436x.2018058

通信学报 ›› 2018, Vol. 39 ›› Issue (4): 35-44.doi: 10.11959/j.issn.1000-436x.2018058

无线网络中基于深度Q学习的传输调度方案

朱江,王婷婷(),宋永辉,刘亚利

重庆邮电大学移动通信技术重点实验室，重庆 400065

修回日期:2018-03-16 出版日期:2018-04-01 发布日期:2018-04-29
作者简介:朱江（1977-），男，湖北荆州人，博士，重庆邮电大学教授，主要研究方向为认知无线电、移动通信、网络安全态势感知。|王婷婷（1993-），女，安徽安庆人，重庆邮电大学硕士生，主要研究方向为网络安全态势感知。|宋永辉（1991-），男，河北邯郸人，重庆邮电大学硕士生，主要研究方向为认知无线电。|刘亚利（1990-），男，河南商丘人，重庆邮电大学硕士生，主要研究方向为认知无线电。
基金资助:
国家自然科学基金资助项目(61102062);国家自然科学基金资助项目(61271260);国家自然科学基金资助项目(61301122);重庆市基础与前沿研究计划基金资助项目(cstc2015jcyjA40050)

Transmission scheduling scheme based on deep Q learning in wireless network

Jiang ZHU,Tingting WANG(),Yonghui SONG,Yali LIU

Key Laboratory of Information and Communication Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065

Revised:2018-03-16 Online:2018-04-01 Published:2018-04-29
Supported by:
The National Natural Science Foundation of China(61102062);The National Natural Science Foundation of China(61271260);The National Natural Science Foundation of China(61301122);Chongqing Research Program of Basic Research and Frontier Technology(cstc2015jcyjA40050)

摘要/Abstract

摘要：

针对无线网络中的数据传输问题，提出一种基于深度Q学习（QL,Q learning）的传输调度方案。该方案通过建立马尔可夫决策过程（MDP,Markov decision process）系统模型来描述系统的状态转移情况；使用Q学习算法在系统状态转移概率未知的情况下学习和探索系统的状态转移信息，以获取调度节点的近似最优策略。另外，当系统状态的规模较大时，采用深度学习（DL,deep learning）的方法来建立状态和行为之间的映射关系，以避免策略求解中产生的较大计算量和存储空间。仿真结果表明，该方法在功耗、吞吐量、分组丢失率方面的性能逼近基于策略迭代的最优策略，且算法复杂度较低，解决了维灾问题。

关键词: 无线网络传输, 马尔可夫决策过程, Q学习, 深度学习

Abstract:

To cope with the problem of data transmission in wireless networks,a deep Q learning based transmission scheduling scheme was proposed.The Markov decision process system model was formulated to describe the state transition of the system.The Q learning algorithm was adopted to learn and explore the system states transition information in the case of unknown system states transition probability to obtain the approximate optimal strategy of the schedule node.In addition,when the system state scale was big,the deep learning method was employed to map the relation between state and behavior to solve the problem of the large amount of computation and storage space in Q learning process.The simulation results show that the proposed scheme can approach the optimal strategy based on strategy iteration in terms of power consumption,throughput,packets loss rate.And the proposed scheme has a lower complexity,which can solve the problem of the curse of dimensionality.

Key words: wireless network transmission, Markov decision process, Q learning, deep learning

中图分类号:

TN929.5

朱江,王婷婷,宋永辉,刘亚利. 无线网络中基于深度Q学习的传输调度方案[J]. 通信学报, 2018, 39(4): 35-44.

Jiang ZHU,Tingting WANG,Yonghui SONG,Yali LIU. Transmission scheduling scheme based on deep Q learning in wireless network[J]. Journal on Communications, 2018, 39(4): 35-44.

图/表 10

图1

图2

图3

表1

表2

仿真参数"

参数	值描述
信噪比门限/dB	snr=[?6.28,?1.28,1.28]
多普勒频移/Hz	f_D=50
帧长/s	T_f=2 ×10^?3
时隙数1	I₁=5×10³
时隙数2	I₂=1×10³
噪声功率	1×10^?3
缓存区压力系数	θ=0.5
到达率	λ=[0.1,…,0.9]
误比特率约束	BER=10^?3
折扣系数	γ=0.9
索引权重	$ς = \frac{1}{\sqrt{2}}$
MQL学习因子	α∈ (0,1]
SAE各层神经元数	9,15,15,12]
权值权重	μ=3×10^?3
SAE学习速率	1×10^?2
训练误差精度	error=1×10^?5

表2

图4

图5

图6

图7

图8

参考文献 21

[17]	KOBAYASHI T , SHIBUYA T , TANAKA F ,et al. Q-learning in continuous state-action space by using a selective desensitization neural network[J]. IEICE Technical Report Neurocomputing, 2011,111: 119-123.
[18]	周文云 . 强化学习维数灾问题解决方法研究[D]. 苏州:苏州大学, 2009.
[1]	朱江, 徐斌阳, 李少谦 . 一种基于马尔可夫决策过程的认知无线电网络传输调度方案[J]. 电子与信息学报, 2009,31(8): 2019-2023.
	ZHU J , XU B Y , LI S Q . A transmission and scheduling scheme based on Markov decision process in cognitive radio networks[J]. Journal of Electronics ＆ Information Technology, 2009,31(8): 2019-2023.
[18]	ZHOU W Y . Research on the curse of dimensionality in reinforcement learning[D]. Suzhou:Soochow University, 2009.
[19]	LIU W , LIU N , SUN H ,et al. Dispatching algorithm design for elevator group control system with Q-learning based on a recurrent neural network[C]// Control and Decision Conference. 2013: 3397-3402.
[2]	ZHU J , PENG Z Z , LI F . A transmission and scheduling scheme based on W-learning algorithm in wireless networks[C]// 8th International ICST Conference on Communications and Networking in China (CHINACOM). 2013: 85-90.
[3]	LI H , HAN Z . Competitive spectrum access in cognitive radio networks:graphical game and learning[C]// Wireless Communications and Networking Conference (WCNC). 2010: 1-6.
[20]	WEI Q , LEWISF L , SUN Q ,et al. Discrete-time deterministic Q-learning:a novel convergence analysis[J]. IEEE transactions on cybernetics, 2016: 1-14.
[21]	李军, 徐玖平 . 运筹学:非线性系统优化[M]. 北京: 科学出版社, 2003.
[4]	林晓辉, 谭宇, 张俊玲 ,等. 无线传输中基于马尔可夫决策的高能效策略[J]. 系统工程与电子技术, 2014,36(7): 1433-1438.
	LIN X H , TAN Y , ZHANG J L ,et al. MDP-based energy efficient policy for wireless transmission[J]. Systems Engineering and Electronics, 2014,36(7): 1433-1438.
[5]	WANG H S , MOAYERI N . Finite-state Markov channel-a useful model for radio communication channels[J]. IEEE Transactions on Vehicular Technology, 1995,44(1): 163-171.
[6]	GAO Q , ZHU G , LIN S ,et al. Robust QoS-aware cross-layer design of adaptive modulation transmission on OFDM systems in high-speed railway[J]. IEEE Access, 2016,PP(99):1.
[7]	CHEN X , CHEN W . Delay-optimal probabilistic scheduling for low-complexity wireless links with fixed modulation and coding:a cross-layer design[J]. IEEE Transactions on Vehicular Technology, 2016:1.
[8]	LAU V K N , . Performance of variable rate bit interleaved coding for high bandwidth efficiency[C]// The Vehicular Technology Conference. 2000: 2054-2058.
[9]	CHUNG S T , GOLDSMITH A J . Degrees of freedom in adaptive modulation:a unified view[C]// IEEE Transactions on Communications. 2001: 1561-1571.
[10]	WEI Q , LIU D , SHI G . A novel dual iterative Q-learning method for optimal battery management in smart residential environments[J]. IEEE Transactions on Industrial Electronics, 2015,62(4): 2509-2518.
[11]	NI J , LIU M , REN L ,et al. A multiagent Q-learning-based optimal allocation approach for urban water resource management system[J]. IEEE Transactions on Automation Science ＆ Engineering, 2014,11(1): 204-214.
[12]	SILVER D , HUANG A , MADDISON C J ,et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016,529(7587): 484-489.
[13]	WEI C , ZHANG Z , QIAO W ,et al. An adaptive network-based reinforcement learning method for MPPT control of PMSG wind energy conversion systems[J]. IEEE Transactions on Power Electronics, 2016:1.
[14]	KIM T , SUN Z , COOK C ,et al. Invited-cross-layer modeling and optimization for electromigration induced reliability[C]// Design Automation Conference. 2016: 1-6.
[15]	COMSA I S , ZHANG S , AYDIN M . A novel dynamic Q-learning-based scheduler technique for LTE-advanced technologies using neural networks[C]// Conference on Local Computer Networks. 2012: 332-335.
[16]	TENG T H , TAN A H . Fast reinforcement learning under uncertainties with self-organizing neural networks[C]// IEEE / WIC / ACM International Conference on Web Intelligence and Intelligent Agent Technology. 2015: 51-58.
[21]	LI J , XU J P . Operations research:nonlinear system optimization[M]. Beijing: Science PressPress, 2003.

算法	是否依赖先验信息	指数运算	乘除运算	加减运算	比较运算
SI法	是	0	D+S	D	D
本文算法	否	2A	4A	2A	2A
W学习法	否	2A	3A	3A	A
RS法	否	0	1	0	0

无线网络中基于深度Q学习的传输调度方案

Transmission scheduling scheme based on deep Q learning in wireless network

在线阅读

PDF下载

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 21

相关文章 15

Metrics

推荐阅读 0

[1]	陈东昱, 陈华, 范丽敏, 付一方, 王舰. 基于深度学习的随机性检验策略研究[J]. 通信学报, 2023, 44(6): 23-33.
[2]	李荣鹏, 汪丙炎, 张宏纲, 赵志峰. 知识增强的语义通信接收端设计[J]. 通信学报, 2023, 44(6): 70-76.
[3]	马帅, 裴科, 祁华艳, 李航, 曹雯, 王洪梅, 熊海良, 李世银. 基于生成模型的地磁室内高精度定位算法研究[J]. 通信学报, 2023, 44(6): 211-222.
[4]	杨洁, 董标, 付雪, 王禹, 桂冠. 基于轻量化分布式学习的自动调制分类方法[J]. 通信学报, 2022, 43(7): 134-142.
[5]	杨秀璋, 彭国军, 李子川, 吕杨琦, 刘思德, 李晨光. 基于Bert和BiLSTM-CRF的APT攻击实体识别及对齐研究[J]. 通信学报, 2022, 43(6): 58-70.
[6]	廖勇, 王世义. 高速移动环境下基于RM-Net的大规模MIMO CSI反馈算法[J]. 通信学报, 2022, 43(5): 166-176.
[7]	廖育荣, 王海宁, 林存宝, 李阳, 方宇强, 倪淑燕. 基于深度学习的光学遥感图像目标检测研究进展[J]. 通信学报, 2022, 43(5): 190-203.
[8]	赵增华, 童跃凡, 崔佳洋. 基于域自适应的Wi-Fi指纹设备无关室内定位模型[J]. 通信学报, 2022, 43(4): 143-153.
[9]	廖勇, 程港, 李玉杰. 基于深度展开的大规模MIMO系统CSI反馈算法[J]. 通信学报, 2022, 43(12): 77-88.
[10]	段雪源, 付钰, 王坤, 李彬. 基于简单统计特征的LDoS攻击检测方法[J]. 通信学报, 2022, 43(11): 53-64.
[11]	霍俊彦, 邱瑞鹏, 马彦卓, 杨付正. 基于最邻近帧质量增强的视频编码参考帧列表优化算法[J]. 通信学报, 2022, 43(11): 136-147.
[12]	康海燕, 冀源蕊. 基于本地化差分隐私的联邦学习方法研究[J]. 通信学报, 2022, 43(10): 94-105.
[13]	张红霞, 王琪, 王登岳, 王奔. 基于深度学习的区块链蜜罐陷阱合约检测[J]. 通信学报, 2022, 43(1): 194-202.
[14]	晏燕, 丛一鸣, Adnan Mahmood, 盛权政. 基于深度学习的位置大数据统计发布与隐私保护方法[J]. 通信学报, 2022, 43(1): 203-216.
[15]	朱叶, 余宜林, 郭迎春. HRDA-Net：面向真实场景的图像多篡改检测与定位算法[J]. 通信学报, 2022, 43(1): 217-226.