基于价值差异学习的多小区mMTC接入算法

doi:10.11959/j.issn.1000-0801.2022152

电信科学 ›› 2022, Vol. 38 ›› Issue (6): 82-90.doi: 10.11959/j.issn.1000-0801.2022152

基于价值差异学习的多小区mMTC接入算法

李昕¹, 孙君¹^,²

¹ 南京邮电大学通信与信息工程学院，江苏南京 210003
² 江苏省无线通信重点实验室，江苏南京 210003

修回日期:2022-04-06 出版日期:2022-06-20 发布日期:2022-06-01
作者简介:李昕（1997- ），女，南京邮电大学通信与信息工程学院硕士生，主要研究方向为大连接物联网设备的随机接入
孙君（1980- ），女，南京邮电大学副研究员、硕士生导师，主要研究方向为无线网络、无线资源管理和物联网
基金资助:
国家自然科学基金资助项目(61771255);省部级重点实验室开放课题项目(20190904)

Value-difference learning based mMTC devices access algorithm in multi-cell network

Xin LI¹, Jun SUN¹^,²

¹ College of Telecommunications ＆Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
² Jiangsu Key Laboratory of Wireless Communications, Nanjing 210003, China

Revised:2022-04-06 Online:2022-06-20 Published:2022-06-01
Supported by:
The National Natural Science Foundation of China(61771255);Provincial and Ministerial Key Laboratory Open Project(20190904)

摘要/Abstract

摘要：

在5G大连接物联网场景下，针对大连接物联网设备（massive machine type communication device， mMTCD）的接入拥塞现象，提出了基于价值差异探索的双重深度Q网络（double deep Q network with value-difference based exploration，VDBE-DDQN）算法。该算法着重解决了在多小区网络环境下mMTCD接入基站的问题，并将该深度强化算法的状态转移过程建模为马尔可夫决策过程。该算法使用双重深度Q网络来拟合目标状态—动作值函数，并采用基于价值差异的探索策略，可以同时利用当前条件和预期的未来需求来应对环境变化，每个mMTCD根据当前值函数与网络估计的下一时刻值函数的差异来更新探索概率，而不是使用统一的标准，从而为mMTCD选择最佳基站。仿真结果表明，所提算法可有效提高系统的接入成功率。

关键词: 大连接物联网, 随机接入, 强化学习, 基站选择

Abstract:

In the massive machine type communication scenario of 5G, the access congestion problem of massive machine type communication devices (mMTCD) in multi-cell network is very important.A double deep Q network with value-difference based exploration (VDBE-DDQN) algorithm was proposed.The algorithm focused on the solution that could reduce the collision when a number of mMTCDs accessed to eNB in multi-cell network.The state transition process of the deep reinforcement learning algorithm was modeled as Markov decision process.Furthermore, the algorithm used a double deep Q network to fit the target state-action value function, and it employed an exploration strategy based on value-difference to adapt the change of the environment, which could take advantage of both current conditions and expected future needs.Moreover, each mMTCD updated the probability of exploration according to the difference between the current value function and the next value function estimated by the network, rather than using the same standard to select the best base eNB for the mMTCD.Simulation results show that the proposed algorithm can effectively improve the access success rate of the system.

Key words: mMTC, RA, reinforcement learning, eNB selection

中图分类号:

TN929.5

李昕, 孙君. 基于价值差异学习的多小区mMTC接入算法[J]. 电信科学, 2022, 38(6): 82-90.

Xin LI, Jun SUN. Value-difference learning based mMTC devices access algorithm in multi-cell network[J]. Telecommunications Science, 2022, 38(6): 82-90.

图/表 9

图1

图2

图3

表1

图4

图5

图6

图7

图8

参考文献 22

[1]	TULLBERG H , POPOVSKI P , LI Z X ,et al. The METIS 5G system concept:meeting the 5G requirements[J]. IEEE Communications Magazine, 2016,54(12): 132-139.
[2]	Latva-aho M , Lepp?nen K , Clazzer F ,et al. Key drivers and research challenges for 6G ubiquitous wireless intelligence[J]. 2016.
[3]	BI Q . Ten trends in the cellular industry and an outlook on 6G[J]. IEEE Communications Magazine, 2019,57(12): 31-36.[LinkOut]
[4]	董石磊, 赵婧博 . 面向工业场景的 5G 专网解决方案研究[J]. 电信科学, 2021,37(11): 97-103.
	DONG S L , ZHAO J B . Research on 5G private networking schemes for industry[J]. Telecommunications Science, 2021,37(11): 97-103.
[5]	POPLI S , JHA R K , JAIN S . A survey on energy efficient narrowband internet of things (NBIoT):architecture,application and challenges[J]. IEEE Access, 2018(7): 16739-16776.
[6]	NAVARRO-ORTIZ J , ROMERO-DIAZ P , SENDRA S ,et al. A survey on 5G usage scenarios and traffic models[J]. IEEE Communications Surveys ＆ Tutorials, 2020,22(2): 905-929.
[7]	ANALYTICS S . Number of Internet of things(IoT) connected devices worldwide in 2018,2025 and 2030(in billions)[J]. Statista Inc, 2020,(7): 17.
[8]	SHARMA S K , WANG X B . Toward massive machine type communications in ultra-dense cellular IoT networks:current issues and machine learning-assisted solutions[J]. IEEE Communications Surveys ＆ Tutorials, 2020,22(1): 426-471.
[9]	3GPP. Study on RAN improvements for machine-type communications:TR 37.868[R]. 2011.
[10]	ALI M S , HOSSAIN E , KIM D I . LTE/LTE-A random access for massive machine-type communications in smart cities[J]. IEEE Communications Magazine, 2017,55(1): 76-83.
[11]	SHARMA S K , WANG X B . Collaborative distributed Q-learning for RACH congestion minimization in cellular IoT networks[J]. IEEE Communications Letters, 2019,23(4): 600-603.
[12]	DA SILVA M V , SOUZA R D , ALVES H ,et al. A NOMA-based Q-learning random access method for machine type communications[J]. IEEE Wireless Communications Letters, 2020,9(10): 1720-1724.
[13]	TSOUKANERI G , WU S B , WANG Y . Probabilistic preamble selection with reinforcement learning for massive machine type communication (MTC) devices[C]// Proceedings of 2019 IEEE 30th Annual International Symposium on Personal,Indoor and Mobile Radio Communications. Piscataway:IEEE Press, 2019: 1-6.
[14]	PACHECO-PARAMO D , TELLO-OQUENDO L . Adjustable access control mechanism in cellular MTC networks:a double Q-learning approach[C]// Proceedings of 2019 IEEE Fourth Ecuador Technical Chapters Meeting. Piscataway:IEEE Press, 2019: 1-6.
[15]	BAI J N , SONG H , YI Y ,et al. Multiagent reinforcement learning meets random access in massive cellular Internet of Things[J]. IEEE Internet of Things Journal, 2021,8(24): 17417-17428.
[16]	MOHAMMED A H , KHWAJA A S , ANPALAGAN A ,et al. Base Station selection in M2M communication using Q-learning algorithm in LTE-A networks[C]// Proceedings of 2015 IEEE 29th International Conference on Advanced Information Networking and Applications. Piscataway:IEEE Press, 2015: 17-22.
[17]	LEE D , ZHAO Y , LEE J . Reinforcement learning for random access in multi-cell networks[C]// Proceedings of 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC). Piscataway:IEEE Press, 2021: 335-338.
[18]	MOON J , LIM Y . Access control of MTC devices using reinforcement learning approach[C]// Proceedings of 2017 International Conference on Information Networking (ICOIN). Piscataway:IEEE Press, 2017: 641-643.
[19]	LIEN S Y , CHEN K C , LIN Y H . Toward ubiquitous massive accesses in 3GPP machine-to-machine communications[J]. IEEE Communications Magazine, 2011,49(4): 66-74.
[20]	VAN HASSELT H , GUEZ A , SILVER D . Deep reinforcement learning with double q-learning[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2016: 2094-2100.
[21]	SILVER D , HUANG A , MADDISON C J ,et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016,529(7587): 484-489.
[22]	TIELEMAN T , HINTON G . Lecture 6.5-rmsprop:divide the gradient by a running average of its recent magnitude[J]. COURSERA:Neural networks for machine learning, 2012,4(2): 26-31.

仿真参数	参数值
训练回合数	150
训练步数T	500
批大小	8
经验回放池D的大小	500
网络参数更换频率E^-	50
学习率α	0.01
折扣率γ	0.9

基于价值差异学习的多小区mMTC接入算法

Value-difference learning based mMTC devices access algorithm in multi-cell network

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 22

相关文章 15

Metrics

推荐阅读 0

[1]	胡珈玮, 刘晓谦, 唐昕柯, 董宇涵. 基于DQN的UUV辅助水下无线光通信轨迹规划系统[J]. 电信科学, 2023, 39(5): 42-47.
[2]	王雪荣, 唐政治, 李银川, 齐美玉, 朱建波, 张亮. 基于优化决策树的时延敏感流智能感知调度[J]. 电信科学, 2023, 39(4): 120-132.
[3]	廖熙雯, 冷甦鹏, 明昱君, 李天扬. 基于数字孪生的城市交通流智能预测与导引策略[J]. 电信科学, 2023, 39(3): 70-79.
[4]	康宇, 刘雅琼, 赵彤雨, 寿国础. AI算法在车联网通信与计算中的应用综述[J]. 电信科学, 2023, 39(1): 1-19.
[5]	汪晗, 刁磊, 王梦玲, 荣欣, 李佳珉, 尤肖虎. 工业物联网中URLLC的关键问题分析[J]. 电信科学, 2022, 38(Z1): 77-92.
[6]	邓丹昊, 王朝炜, 江帆, 王卫东. 无人机辅助无蜂窝大规模MIMO中的空地协同调度[J]. 电信科学, 2022, 38(8): 37-44.
[7]	戴晓明, 庞立卓, 常争, 张馨月, 邢怡然, 王曦元. 面向mMIMO系统的模式分割随机接入方案[J]. 电信科学, 2022, 38(10): 57-66.
[8]	金楠, 王瑞琴, 陆悦聪. 基于艾宾浩斯遗忘曲线和注意力机制的推荐算法[J]. 电信科学, 2022, 38(10): 89-97.
[9]	章坚武, 王路鑫, 孙玲芬, 章谦骅, 单杭冠. 人工智能在5G系统中的应用综述[J]. 电信科学, 2021, 37(5): 14-31.
[10]	伍仲丽, 曹园园, 黄文睿, 戴彬, 莫益军. 面向确定性网络的按需智能路由技术[J]. 电信科学, 2021, 37(11): 11-16.
[11]	桂飞,程阳,李丹,洪思虹. 互联网智能路由架构及算法[J]. 电信科学, 2020, 36(10): 12-20.
[12]	何明捷,张杰,山世光. 神经结构搜索进展概述[J]. 电信科学, 2019, 35(5): 43-50.
[13]	段红光,卢松品,王利飞,王胜,李同会,谭丹. LTE-Advanced时隙接入中动态帧长选择算法[J]. 电信科学, 2017, 33(3): 8-13.
[14]	段红光,卢松品,王利飞,谭丹,高江奇. 机器类通信中基于负载反馈的拥塞控制方法[J]. 电信科学, 2016, 32(11): 26-31.
[15]	冯陈伟,袁江南. 基于强化学习的异构无线网络资源管理算法[J]. 电信科学, 2015, 31(8): 99-106.