物联网学报 ›› 2022, Vol. 6 ›› Issue (1): 53-64.doi: 10.11959/j.issn.2096-3750.2022.00260

• 理论与技术 • 上一篇    下一篇

基于深度强化学习的智能车间调度方法研究

罗梓珲, 江呈羚, 刘亮, 郑霄龙, 马华东   

  1. 北京邮电大学计算机学院(国家示范性软件学院),北京 100876
  • 修回日期:2022-01-21 出版日期:2022-03-30 发布日期:2022-03-01
  • 作者简介:罗梓珲(1996− ),男,北京邮电大学博士生,主要研究方向为工业物联网、边缘计算
    江呈羚(1997− ),女,北京邮电大学硕士生,主要研究方向为智能优化调度、深度强化学习
    刘亮(1982− ),男,北京邮电大学教授,主要研究方向为物联网、智能感知技术
    郑霄龙(1989− ),男,北京邮电大学副教授,主要研究方向为物联网、无线网络、普适计算
    马华东(1964− ),男,北京邮电大学教授,主要研究方向为多媒体系统与网络、物联网与传感网、视频理解与大数据分析
  • 基金资助:
    国家自然科学基金资助项目(62061146002);国家自然科学基金资助项目(61632008);国家自然科学基金资助项目(61921003);中央高校基本科研业务费资助项目(2019XD-A14)

Research on deep reinforcement learning based intelligent shop scheduling method

Zihui LUO, Chengling JIANG, Liang LIU, Xiaolong ZHENG, Huadong MA   

  1. School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Revised:2022-01-21 Online:2022-03-30 Published:2022-03-01
  • Supported by:
    The National Natural Science Foundation of China(62061146002);The National Natural Science Foundation of China(61632008);The National Natural Science Foundation of China(61921003);The Fundamental Research Funds for the Central Universities(2019XD-A14)

摘要:

工业物联网的空前繁荣为传统的工业生产制造模式开辟了一条新的道路。智能车间调度是整个生产过程实现全面控制和柔性生产的关键技术之一,要求以最大完工时间最小化分派多道工序和多台机器的生产调度。首先,将车间调度问题定义为马尔可夫决策过程,建立了一个基于指针网络的车间调度模型。其次,将作业调度过程看作是从一个序列到另一个序列的映射,提出了一种基于深度强化学习的车间调度算法。通过分析模型在不同参数设置下的收敛性,确定了最优参数。在不同规模的公共数据集和实际生产数据集上的实验结果表明,所提出的深度强化学习算法能够取得更好的性能。

关键词: 工业物联网, 智能车间调度, 柔性生产, 深度强化学习, 车间调度方法

Abstract:

The unprecedented prosperity of the industrial internet of things (IIoT) has opened up a new path for the traditional industrial manufacturing model.Intelligent shop scheduling is one of the key technologies to achieve the overall control and flexible production of the whole production process.It requires an effective plan with a minimum makespan to allocate multiple processes and multiple machines for production scheduling.Firstly, the shop scheduling problem was defined as a Markov decision process (MDP), and a shop scheduling model based on the pointer network was established.Secondly, the job scheduling process was regarded as a mapping from one sequence to another, and a new shop scheduling algorithm based on deep reinforcement learning (DRL) was proposed.By analyzing the convergence of the model under different parameter settings, the optimal parameters were determined.Experimental results on different scales of public data sets and actual production data sets show that the proposed DRL algorithm can obtain better performances.

Key words: IIoT, intelligent shop scheduling, flexible production, deep reinforcement learning, shop scheduling method

中图分类号: 

No Suggested Reading articles found!