通信学报 ›› 2023, Vol. 44 ›› Issue (6): 90-102.doi: 10.11959/j.issn.1000-436x.2023119

• 学术论文 • 上一篇    下一篇

基于强化学习的在线离线混部云环境下的调度框架

马玲1, 樊漆亮1, 许婷1, 郭冠琛2, 张圣林1, 孙永谦1, 张玉志1   

  1. 1 南开大学软件学院,天津 300350
    2 北京大学计算机学院,北京 100871
  • 修回日期:2023-06-13 出版日期:2023-06-25 发布日期:2023-06-01
  • 作者简介:马玲(1985- ),女,吉林洮南人,博士,南开大学副教授,主要研究方向为人工智能
    樊漆亮(1999- ),男,江西南昌人,南开大学硕士生,主要研究方向为时间序列异常检测
    许婷(1991- ),女,河南正阳人,南开大学博士生,主要研究方向为异常检测、故障定位、根因分析和故障预测等
    郭冠琛(2000- ),女,山东淄博人,北京大学硕士生,主要研究方向为异常检测、故障定位、根因分析和故障预测等
    张圣林(1989- ),男,山东滨州人,博士,南开大学副教授,主要研究方向为异常检测、故障定位、根因分析和故障预测等
    孙永谦(1988- ),男,河北石家庄人,博士,南开大学助理教授,主要研究方向为异常检测、故障定位、根因分析和故障预测等
    张玉志(1964- ),男,河北邢台人,博士,南开大学讲席教授、博士生导师,主要研究方向为人工智能和软件工程
  • 基金资助:
    国家自然科学基金资助项目(62272249);国家自然科学基金资助项目(61901234)

Scheduling framework based on reinforcement learning in online-offline colocated cloud environment

Ling MA1, Qiliang FAN1, Ting XU1, Guanchen GUO2, Shenglin ZHANG1, Yongqian SUN1, Yuzhi ZHANG1   

  1. 1 College of Software, NanKai University, Tianjin 300350, China
    2 School of Computer Science, Peking University, Beijing 100871, China
  • Revised:2023-06-13 Online:2023-06-25 Published:2023-06-01
  • Supported by:
    The National Natural Science Foundation of China(62272249);The National Natural Science Foundation of China(61901234)

摘要:

目前针对云计算平台的强化学习调度算法考虑的场景较单一,或者忽略了任务的资源约束并简单地将所有机器看作同一类型,存在资源利用率较低及调度效率不高等不足。为了解决云环境中的在线离线混部调度问题,提出 JobFusion 框架。首先,通过集成带连通性约束的层次要素算法,在基于虚拟化技术的云计算平台中构建高效的资源划分方案;其次,为了解决扩展性问题,使用图卷积神经网络对具有任意层次约束关系及任意数量的任务进行嵌入,以捕获工作流的关键路径等信息;最后,集成了表现优异的强化学习模型对任务实施调度。实验结果表明,相较对比方法,JobFusion提高了39.86%的资源利用率,且最多降低了64.36%的平均任务完成时间。

关键词: 强化学习, 图嵌入, 层次聚类, 云计算, 虚拟化

Abstract:

Some reinforcement learning-based scheduling algorithms for cloud computing platforms barely considered one scenario or ignored the resource constraints of jobs and treated all machines as the same type, which caused low resource utilization or insufficient scheduling efficiency.To address the scheduling problems in online-offline colocated cloud environment, a framework named JobFusion was proposed.Firstly, an efficient resource partitioning scheme was built in the cloud computing platform supporting virtualization technology by integrating the hierarchical clustering method with connectivity constraints.Secondly, a graph convolutional neural network was utilized to embed the attributes of elastic dimension with various constraints and the jobs with various numbers, to capture the critical path information of workflow.Finally, existing high-performance reinforcement learning methods were integrated for scheduling jobs.According to the results of evaluation experiments, JobFusion improves the resource utilization by 39.86% and reduces the average job completion time by up to 64.36% compared with baselines.

Key words: reinforcement learning, graph embedding, hierarchical cluster, cloud computing, virtualization

中图分类号: 

No Suggested Reading articles found!