电信科学 ›› 2022, Vol. 38 ›› Issue (6): 82-90.doi: 10.11959/j.issn.1000-0801.2022152

• 研究与开发 • 上一篇    下一篇

基于价值差异学习的多小区mMTC接入算法

李昕1, 孙君1,2   

  1. 1 南京邮电大学通信与信息工程学院,江苏 南京 210003
    2 江苏省无线通信重点实验室,江苏 南京 210003
  • 修回日期:2022-04-06 出版日期:2022-06-20 发布日期:2022-06-01
  • 作者简介:李昕(1997- ),女,南京邮电大学通信与信息工程学院硕士生,主要研究方向为大连接物联网设备的随机接入
    孙君(1980- ),女,南京邮电大学副研究员、硕士生导师,主要研究方向为无线网络、无线资源管理和物联网
  • 基金资助:
    国家自然科学基金资助项目(61771255);省部级重点实验室开放课题项目(20190904)

Value-difference learning based mMTC devices access algorithm in multi-cell network

Xin LI1, Jun SUN1,2   

  1. 1 College of Telecommunications &Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
    2 Jiangsu Key Laboratory of Wireless Communications, Nanjing 210003, China
  • Revised:2022-04-06 Online:2022-06-20 Published:2022-06-01
  • Supported by:
    The National Natural Science Foundation of China(61771255);Provincial and Ministerial Key Laboratory Open Project(20190904)

摘要:

在5G大连接物联网场景下,针对大连接物联网设备(massive machine type communication device, mMTCD)的接入拥塞现象,提出了基于价值差异探索的双重深度Q网络(double deep Q network with value-difference based exploration,VDBE-DDQN)算法。该算法着重解决了在多小区网络环境下mMTCD接入基站的问题,并将该深度强化算法的状态转移过程建模为马尔可夫决策过程。该算法使用双重深度Q网络来拟合目标状态—动作值函数,并采用基于价值差异的探索策略,可以同时利用当前条件和预期的未来需求来应对环境变化,每个mMTCD根据当前值函数与网络估计的下一时刻值函数的差异来更新探索概率,而不是使用统一的标准,从而为mMTCD选择最佳基站。仿真结果表明,所提算法可有效提高系统的接入成功率。

关键词: 大连接物联网, 随机接入, 强化学习, 基站选择

Abstract:

In the massive machine type communication scenario of 5G, the access congestion problem of massive machine type communication devices (mMTCD) in multi-cell network is very important.A double deep Q network with value-difference based exploration (VDBE-DDQN) algorithm was proposed.The algorithm focused on the solution that could reduce the collision when a number of mMTCDs accessed to eNB in multi-cell network.The state transition process of the deep reinforcement learning algorithm was modeled as Markov decision process.Furthermore, the algorithm used a double deep Q network to fit the target state-action value function, and it employed an exploration strategy based on value-difference to adapt the change of the environment, which could take advantage of both current conditions and expected future needs.Moreover, each mMTCD updated the probability of exploration according to the difference between the current value function and the next value function estimated by the network, rather than using the same standard to select the best base eNB for the mMTCD.Simulation results show that the proposed algorithm can effectively improve the access success rate of the system.

Key words: mMTC, RA, reinforcement learning, eNB selection

中图分类号: 

No Suggested Reading articles found!