电信科学 ›› 2021, Vol. 37 ›› Issue (11): 41-50.doi: 10.11959/j.issn.1000-0801.2021244

• 研究与开发 • 上一篇    下一篇

机器类通信中集中式与分布式Q学习的资源分配算法研究

余云河, 孙君   

  1. 南京邮电大学通信与信息工程学院,江苏 南京 210023
  • 修回日期:2021-10-20 出版日期:2021-11-20 发布日期:2021-11-01
  • 作者简介:余云河(1995− ),男,南京邮电大学通信与信息工程学院硕士生,主要研究方向为大规模机器类通信网络中的资源分配
    孙君(1980− ),女,南京邮电大学硕士生导师,主要研究方向为无线网络资源管理
  • 基金资助:
    国家自然科学基金资助项目(61771255);中国科学院重点实验室开放课题(20190904)

Research on resource allocation algorithm of centralized and distributed Q-learning in machine communication

Yunhe YU, Jun SUN   

  1. College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
  • Revised:2021-10-20 Online:2021-11-20 Published:2021-11-01
  • Supported by:
    The National Natural Science Foundation of China(61771255);Open Project of Key Laboratory of Chinese Academy of Sciences(20190904)

摘要:

针对海量机器类通信(massive machine type communication,mMTC)场景,以最大化系统吞吐量为目标,且在保证部分机器类通信设备(machine type communication device,MTCD)的服务质量(quality of service,QoS)要求前提下,提出两种基于Q学习的资源分配算法:集中式Q学习算法(team-Q)和分布式Q学习算法(dis-Q)。首先基于余弦相似度(cosine similarity,CS)聚类算法,考虑到MTCD地理位置和多级别QoS要求,构造代表MTCD和数据聚合器(data aggregator,DA)的多维向量,根据向量间CS值完成分组。然后分别利用team-Q学习算法和dis-Q学习算法为MTCD分配资源块(resource block,RB)和功率。吞吐量性能上,team-Q 和 dis-Q 算法相较于动态资源分配算法、贪婪算法分别平均提高了 16%、23%;复杂度性能上,dis-Q算法仅为team-Q算法的25%及以下,收敛速度则提高了近40%。

关键词: 资源分配, 集中式Q学习, 分布式Q学习, 余弦相似度, 多维向量

Abstract:

Under the premise of ensuring partial machine type communication device (MTCD)’s quality of service (QoS) requirements, the resource allocation problem was studied with the goal of maximizing system throughput in the massive machine type communication (mMTC) scenario.Two resource allocation algorithms based on Q-learning were proposed: centralized Q-learning algorithm (team-Q) and distributed Q-learning algorithm (dis-Q).Firstly, taking into account MTCD’s geographic location and multi-level QoS requirements, a clustering algorithm based on cosine similarity (CS) was designed.In the clustering algorithm, multi-dimensional vectors that represent MTCD and data aggregator (DA) were constructed, and MTCDs can be grouped according to the CS value between multi-dimensional vectors.Then in the MTC network, the team-Q learning algorithm and dis-Q learning algorithm were used to allocate resource blocks and power for the MTCD.In terms of throughput performance, team-Q and dis-Q algorithms have an average increase of 16% and 23% compared to the dynamic resource allocation algorithm and the greedy algorithm, respectively.In terms of complexity performance, the dis-Q algorithm is only 25% of team-Q algorithm and even below, the convergence speed is increased by nearly 40%.

Key words: resource allocation, centralized Q-learning, distributed Q-learning, consine similarity, multi-dimensional vector

中图分类号: 

No Suggested Reading articles found!