基于深度强化学习的物联网智能路由策略

doi:10.11959/j.issn.2096-3750.2019.00097

物联网学报 ›› 2019, Vol. 3 ›› Issue (2): 56-63.doi: 10.11959/j.issn.2096-3750.2019.00097

基于深度强化学习的物联网智能路由策略

丁瑞金¹,高飞飞¹,邢玲²

¹ 清华大学自动化系，北京 100084
² 河南科技大学，河南洛阳 471023

修回日期:2019-03-07 出版日期:2019-06-30 发布日期:2019-07-17
作者简介:丁瑞金（1994- ），男，江苏南通人，清华大学博士生，主要研究方向为AI及其在通信中的应用。|高飞飞（1980- ），男，陕西西安人，博士，清华大学副教授、博士生导师，主要研究方向为多天线通信以及智能信号处理技术。|邢玲（1978- ），女，四川成都人，博士，河南科技大学教授、博士生导师，主要研究方向为智能信息传输、计算机网络与多媒体技术。

Intelligent routing strategy in the Internet of things based on deep reinforcement learning

Ruijin DING¹,Feifei GAO¹,Ling XING²

¹ Department of Automation,Tsinghua University,Beijing 100084,China
² Henan University of Science and Technology,Luoyang 471023,China

Revised:2019-03-07 Online:2019-06-30 Published:2019-07-17

摘要/Abstract

摘要：

随着物联网时代的到来，万物互联的传输模式引发数据量爆炸式增长，给传统路由协议带来了严峻挑战。阐述了在数据量急剧增长的情况下，已有路由协议的局限性，并将路由选择问题重新建模为马尔可夫决策过程。在此基础上，采用深度强化学习方法为每项数据传输任务选择下一跳路由器，从而在避免数据堵塞的前提下尽可能缩短数据传输路径长度。仿真结果表明，所提方法能够显著降低数据堵塞概率，增大网络吞吐量。

关键词: 深度强化学习, 路由, 物联网, 网络堵塞

Abstract:

At the era of the Internet of things,networking mode that connects everything would bring tremendous increase in the data volume and challenge the traditional routing protocols.The limitations of the existing routing protocols was analyzed when facing the data explosion and then the routing selection problem was re-modeled as a Markov decision process.On this basis,the deep reinforcement learning technique was utilized to choose the next-hop router for data transmission task in order to shorten the transmission path length while network congestion was avoided.The simulation results demonstrate that the congestion probability can be reduced significantly and the network throughput can be enhanced by the proposed strategy.

Key words: deep reinforcement learning, routing, Internet of things, network congestion

中图分类号:

TN915

丁瑞金,高飞飞,邢玲. 基于深度强化学习的物联网智能路由策略[J]. 物联网学报, 2019, 3(2): 56-63.

Ruijin DING,Feifei GAO,Ling XING. Intelligent routing strategy in the Internet of things based on deep reinforcement learning[J]. Chinese Journal on Internet of Things, 2019, 3(2): 56-63.

图/表 8

图1

图2

图3

表1

图4

图5

图6

图7

参考文献 15

[1]	孙其博, 刘杰, 黎羴 ,等. 物联网:概念、架构与关键技术研究综述[J]. 北京:北京邮电大学学报, 2010,33(3): 1-9.
	SUN Q B , LIU J , LI S ,et al. Internet of things:summarize on concepts,architecture and key technology problem[J]. Beijing:Journal of Beijing University of Posts and Telecommunications, 2010,33(3): 1-9.
[2]	LIU V , PARKS A , TALLA V ,et al. Ambient backscatter:wireless communication out of thin air[C]// ACM SIGCOMM Computer Communication Review. ACM, 2013，43(4): 39-50.
[3]	QIAN J , GAO F , WANG G ,et al. Noncoherent detections for ambient backscatter system[J]. IEEE Transactions on Wireless Communications, 2017,16(3): 1412-1422.
[4]	NORDRUM A . The Internet of fewer things[J]. IEEE Spectrum, 2016,53(10): 12-13.
[5]	QIAN J , PARKS A N , SMITH J R ,et al. IoT communications with M-PSK modulated ambient backscatter:algorithm,analysis and implementation[J]. IEEE Internet of Things Journal, 2019,6(1): 844-855.
[6]	FORTZ B , THORUP M . Internet traffic engineering by optimizing OSPF weights[J]. IEEE INFOCOM, 2000,2(3): 519-528.
[7]	HEDRICK C L . Routing information protocol[R]. 1988.
[8]	FORTZ B , THORUP M . Optimizing OSPF/IS-IS weights in a changing world[J]. IEEE Journal on Selected Areas in Communications, 2002,20(4): 756-767.
[9]	GRIFFIN T G , SHEPHERD F B , WILFONG G . The stable paths problem and interdomain routing[J]. IEEE/ACM Transactions on Networking (ToN), 2002,10(2): 232-243.
[10]	孙志军, 薛磊, 许阳明 ,等. 深度学习研究综述[J]. 计算机应用研究, 2012,29(8): 2806-2810.
	SUN Z J , XUE L , XU Y M ,et al. Overview of deep learning[J]. Application Research of Computers, 2012,29(8): 2806-2810.
[11]	KATO N , FADLULLAH Z M , MAO B ,et al. The deep learning vision for heterogeneous network traffic control:proposal,challenges and future perspective[J]. IEEE Wireless Communications, 2017,24(3): 146-153.
[12]	TANG F , MAO B , FADLULLAH Z M ,et al. On removing routing protocol from future wireless networks:a real-time deep learning approach for intelligent traffic control[J]. IEEE Wireless Communications, 2018,25(1): 154-160.
[13]	高阳, 陈世福, 陆鑫 . 强化学习研究综述[J]. 自动化学报, 2004,30(1): 86-100.
	GAO Y , CHEN S F , LU X . Research on reinforcement learning technology:a review[J]. ACTA Automatica Sinica, 2004,30(1): 86-100.
[14]	MNIH V , KAVUKCUOGLU K , SILVER D ,et al. Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540):529.
[15]	BOYAN J A , LITTMAN M L . Packet routing in dynamically changing networks:a reinforcement learning approach[C]// Advances in Neural Information Processing Systems. Morgan Kaufmann Publishers Inc, 1994: 671-678.

方法	神经网络数目
本文算法	1
参考文献[12]不考虑回传	16
参考文献[12]考虑回传	∞

基于深度强化学习的物联网智能路由策略

Intelligent routing strategy in the Internet of things based on deep reinforcement learning

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 15

相关文章 15

Metrics

推荐阅读 0

[1]	吴靖, 李晟, 张景, 辛明, 陶若文, 周舟, 潘力佳, 施毅. 面向物联网的新型柔性传感器[J]. 物联网学报, 2023, 7(2): 1-14.
[2]	梁峻阁, 宋怡然, 孙杨帆, 计樱莹, 潘力佳, 施毅. 基于可穿戴与可植入技术的人体健康物联网研究进展[J]. 物联网学报, 2023, 7(2): 26-34.
[3]	耿光磊, 高博, 熊轲, 樊平毅, 陆杨, 王煜炜. 联邦学习赋能6G网络综述[J]. 物联网学报, 2023, 7(2): 50-66.
[4]	卫浓钰, 江子龙, 陈芳炯. 基于位置信息和能量均衡的声电协同网络AODV[J]. 物联网学报, 2023, 7(1): 27-36.
[5]	廖岑卉珊, 陈俊彦, 梁观平, 谢小兰, 卢小烨. 基于深度强化学习的SDN服务质量智能优化算法[J]. 物联网学报, 2023, 7(1): 73-82.
[6]	申滨, 李银波, 梁枭伟. 基于增强加权质心定位的认知物联网用户频谱接入控制[J]. 物联网学报, 2023, 7(1): 93-108.
[7]	汪静, 何乐生, 李忠红, 李路迟, 杨航. 物联网轻量级认证加密算法ASCON的软硬件协同设计[J]. 物联网学报, 2022, 6(4): 139-148.
[8]	蒋伟进, 罗田甜, 杨莹, 李恩, 周文颖. 物联网环境下基于区块链技术的私有数据访问控制模型[J]. 物联网学报, 2022, 6(4): 169-182.
[9]	邢方圆, 贺诗波, 孙铭阳, 陈积明. 基于“云-管-边-端”物联网架构的碳排放监测[J]. 物联网学报, 2022, 6(4): 53-64.
[10]	张在琛, 尤肖虎, 党建, 吴亮, 朱秉诚, 陈绩, 汪磊. 无线光通信与物联网[J]. 物联网学报, 2022, 6(3): 1-13.
[11]	黄诺, 刘伟杰, 龚晨. 面向工业物联网的拍赫兹通信[J]. 物联网学报, 2022, 6(3): 37-46.
[12]	孙君, 赵尚维康. 工业物联网中基于Sarsa算法的节能计算卸载方案[J]. 物联网学报, 2022, 6(3): 82-90.
[13]	刘杨, 李崔灿, 彭木根. 低功耗水下物联网：愿景与关键技术[J]. 物联网学报, 2022, 6(2): 1-9.
[14]	杨靖, 谢金凤, 陈怡. 我国智慧城市场景中物联网终端评测与认证体系研究[J]. 物联网学报, 2022, 6(2): 26-37.
[15]	罗丹, 徐茹枝, 关志涛. 物联网环境中基于深度学习的差分隐私预算优化方法[J]. 物联网学报, 2022, 6(2): 65-76.