基于深度强化学习的智能车间调度方法研究

doi:10.11959/j.issn.2096-3750.2022.00260

物联网学报 ›› 2022, Vol. 6 ›› Issue (1): 53-64.doi: 10.11959/j.issn.2096-3750.2022.00260

基于深度强化学习的智能车间调度方法研究

罗梓珲, 江呈羚, 刘亮, 郑霄龙, 马华东

北京邮电大学计算机学院（国家示范性软件学院），北京 100876

修回日期:2022-01-21 出版日期:2022-03-30 发布日期:2022-03-01
作者简介:罗梓珲（1996− ），男，北京邮电大学博士生，主要研究方向为工业物联网、边缘计算
江呈羚（1997− ），女，北京邮电大学硕士生，主要研究方向为智能优化调度、深度强化学习
刘亮（1982− ），男，北京邮电大学教授，主要研究方向为物联网、智能感知技术
郑霄龙（1989− ），男，北京邮电大学副教授，主要研究方向为物联网、无线网络、普适计算
马华东（1964− ），男，北京邮电大学教授，主要研究方向为多媒体系统与网络、物联网与传感网、视频理解与大数据分析
基金资助:
国家自然科学基金资助项目(62061146002);国家自然科学基金资助项目(61632008);国家自然科学基金资助项目(61921003);中央高校基本科研业务费资助项目(2019XD-A14)

Research on deep reinforcement learning based intelligent shop scheduling method

Zihui LUO, Chengling JIANG, Liang LIU, Xiaolong ZHENG, Huadong MA

School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing 100876, China

Revised:2022-01-21 Online:2022-03-30 Published:2022-03-01
Supported by:
The National Natural Science Foundation of China(62061146002);The National Natural Science Foundation of China(61632008);The National Natural Science Foundation of China(61921003);The Fundamental Research Funds for the Central Universities(2019XD-A14)

摘要/Abstract

摘要：

工业物联网的空前繁荣为传统的工业生产制造模式开辟了一条新的道路。智能车间调度是整个生产过程实现全面控制和柔性生产的关键技术之一，要求以最大完工时间最小化分派多道工序和多台机器的生产调度。首先，将车间调度问题定义为马尔可夫决策过程，建立了一个基于指针网络的车间调度模型。其次，将作业调度过程看作是从一个序列到另一个序列的映射，提出了一种基于深度强化学习的车间调度算法。通过分析模型在不同参数设置下的收敛性，确定了最优参数。在不同规模的公共数据集和实际生产数据集上的实验结果表明，所提出的深度强化学习算法能够取得更好的性能。

关键词: 工业物联网, 智能车间调度, 柔性生产, 深度强化学习, 车间调度方法

Abstract:

The unprecedented prosperity of the industrial internet of things (IIoT) has opened up a new path for the traditional industrial manufacturing model.Intelligent shop scheduling is one of the key technologies to achieve the overall control and flexible production of the whole production process.It requires an effective plan with a minimum makespan to allocate multiple processes and multiple machines for production scheduling.Firstly, the shop scheduling problem was defined as a Markov decision process (MDP), and a shop scheduling model based on the pointer network was established.Secondly, the job scheduling process was regarded as a mapping from one sequence to another, and a new shop scheduling algorithm based on deep reinforcement learning (DRL) was proposed.By analyzing the convergence of the model under different parameter settings, the optimal parameters were determined.Experimental results on different scales of public data sets and actual production data sets show that the proposed DRL algorithm can obtain better performances.

Key words: IIoT, intelligent shop scheduling, flexible production, deep reinforcement learning, shop scheduling method

中图分类号:

TP18

罗梓珲, 江呈羚, 刘亮, 郑霄龙, 马华东. 基于深度强化学习的智能车间调度方法研究[J]. 物联网学报, 2022, 6(1): 53-64.

Zihui LUO, Chengling JIANG, Liang LIU, Xiaolong ZHENG, Huadong MA. Research on deep reinforcement learning based intelligent shop scheduling method[J]. Chinese Journal on Internet of Things, 2022, 6(1): 53-64.

图/表 9

图1

图2

图3

图4

图5

图6

图7

表1

表2

参考文献 34

[1]	GILCHRIST A.Industry 4 . 0:The industrial internet of things[M]. Berkeley,CA: Apress, 2016.
[2]	VINYALS O , FORTUNATO M , JAITLY N . Pointer networks[J]. CoRR, 2015:abs/1506.03134.
[3]	LING Z X , TAO X Y , ZHANG Y ,et al. Solving optimization problems through fully convolutional networks:an application to the traveling salesman problem[J]. IEEE Transactions on Systems,Man,and Cybernetics:Systems, 2021,51(12): 7475-7485.
[4]	NAZARI M , OROOJLOOY A , SNYDER L V ,et al. Reinforcement learning for solving the vehicle routing problem[J]. CoR. 2018:abs/1802.04240.
[5]	BELLO I , PHAM H , LE Q V ,et al. Neural combinatorial optimization with reinforcement learning[C]// Proceeding of 5th International Conference on Learning Representations.Toulon, 2017: 1-13.
[6]	MNIH V , KAVUKCUOGLU K , SILVER D ,et al. Playing atari with deep reinforcement learning[J]. CoRR. 2013:abs/1312.5602.
[7]	ZHANG C , SONG W , CAO Z G ,et al. Learning to dispatch for job shop scheduling via deep reinforcement learning[EB]. 2020.
[8]	刘建伟, 高峰, 罗雄麟 . 基于值函数和策略梯度的深度强化学习综述[J]. 计算机学报, 2019,42(6): 1406-1438.
	LIU J W , GAO F , LUO X L . Survey of deep reinforcement learning based on value function and policy gradient[J]. Chinese Journal of Computers, 2019,42(6): 1406-1438.
[9]	GAREY M R , JOHNSON D S , SETHI R . The complexity of flowshop and jobshop scheduling[J]. Mathematics of Operations Research, 1976,1(2): 117-129.
[10]	JOHNSON S M . Optimal two-and three-stage production schedules with setup times included[J]. Naval Research Logistics Quarterly, 1954,1(1): 61-68.
[11]	REZA HEJAZI S , SAGHAFIAN S . Flowshop-scheduling problems with makespan criterion:a review[J]. International Journal of Production Research, 2005,43(14): 2895-2929.
[12]	ZHANG J , DING G F , ZOU Y S ,et al. Review of job shop scheduling research and its new perspectives under Industry 4.0[J]. Journal of Intelligent Manufacturing, 2019,30(4): 1809-1830.
[13]	HARTMANIS J . Computers and intractability:a guide to the theory of np-completeness (Michael R.garey and David S.Johnson)[J]. Siam Review, 1982,24(1): 90-91.
[14]	ARTHANARY T S . An extension of two machine sequencing problem[J]. Opsearch, 1971(8): 10-22.
[15]	RUIZ R , VáZQUEZ-RODRíGUEZ J A , . The hybrid flow shop scheduling problem[J]. European Journal of Operational Research, 2010,205(1): 1-18.
[16]	TOSUN ? , MARICHELVAM M K , TOSUN N . A literature review on hybrid flow shop scheduling[J]. International Journal of Advanced Operations Management, 2020,12(2): 156.
[17]	李颖俐, 李新宇, 高亮 . 混合流水车间调度问题研究综述[J]. 中国机械工程, 2020,31(23): 2798-2813,2828.
	LI Y L , LI X Y , GAO L . Review on hybrid flow shop scheduling problems[J]. China Mechanical Engineering, 2020,31(23): 2798-2813,2828.
[18]	夏柱昌, 刘芳, 公茂果 ,等. 基于记忆库拉马克进化算法的作业车间调度[J]. 软件学报, 2010,21(12): 3082-3093.
	XIA Z C , LIU F , GONG M G ,et al. Memory based Lamarckian evolutionary algorithm for job shop scheduling problem[J]. Journal of Software, 2010,21(12): 3082-3093.
[19]	REN T , WANG X Y , LIU T Y ,et al. Exact and metaheuristic algorithms for flow-shop scheduling problems with release dates[J]. Engineering Optimization, 2021: 1-17.
[20]	HIDRI L , ELKOSANTINI S , M MABKHOT M . Exact and heuristic procedures for the two-center hybrid flow shop scheduling problem with transportation times[J]. IEEE Access, 2018,6: 21788-21801.
[21]	HUNSUCKER J L , SHAH J R . Comparative performance analysis of priority rules in a constrained flow shop with multiple processors environment[J]. European Journal of Operational Research, 1994,72(1): 102-114.
[22]	CAMPBELL H G , DUDEK R A , SMITH M L . A heuristic algorithm for thenJob,mMachine sequencing problem[J]. Management Science, 1970,16(10): B-630.
[23]	CHEN W , HAO Y F . Genetic algorithm-based design and simulation of manufacturing flow shop scheduling[J]. International Journal of Simulation Modelling, 2018,17(4): 702-711.
[24]	ENGIN O , Gü?Lü A , . A new hybrid ant colony optimization algorithm for solving the no-wait flow shop scheduling problems[J]. Applied Soft Computing, 2018(72): 166-176.
[25]	LI X T , MA S J . Multiobjective discrete artificial bee colony algorithm for multiobjective permutation flow shop scheduling problem with sequence dependent setup times[J]. IEEE Transactions on Engineering Management, 2017,64(2): 149-165.
[26]	CUNHA B , MADUREIRA A M , FONSECA B ,et al. Deep reinforcement learning as a job shop scheduling solver:a literature review[C]// Hybrid Intelligent Systems. 2020.
[27]	LIU C L , CHANG C C , TSENG C J . Actor-critic deep reinforcement learning for solving job shop scheduling problems[J]. IEEE Access, 2020(8): 71752-71762.
[28]	HAN B A , YANG J J . Research on adaptive job shop scheduling problems based on dueling double DQN[J]. IEEE Access, 2020,8: 186474-186495.
[29]	WANG L B , HU X , WANG Y ,et al. Dynamic job-shop scheduling in smart manufacturing using deep reinforcement learning[J]. Computer Networks, 2021,190: 107969.
[30]	LUO S . Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning[J]. Applied Soft Computing, 2020,91: 106208.
[31]	ZHANG C , SONG W , CAO Z G ,et al. Learning to dispatch for job shop scheduling via deep reinforcement learning[J]. Advances in Neural Information Processing Systems, 2020,33.
[32]	王凌, 潘子肖 . 基于深度强化学习与迭代贪婪的流水车间调度优化[J]. 控制与决策, 2021,36(11): 2609-2617.
	WANG L , PAN Z X . Scheduling optimization for flow-shop based on deep reinforcement learning and iterative greedy method[J]. Control and Decision, 2021,36(11): 2609-2617.
[33]	LUO B , WANG S B , YANG B ,et al. An improved deep reinforcement learning approach for the dynamic job shop scheduling problem with random job arrivals[J]. Journal of Physics:Conference Series, 2021,1848(1): 012029.
[34]	TAILLARD E . Benchmarks for basic scheduling problems[J]. European Journal of Operational Research, 1993,64(2): 278-285.

N×K	FIFO	LIFO	LPT	SPT	ACO	GA	ABC-TS	DRL_IG	Ours
20×5	1 404	1 513	1 681	1 339	1 190	1 212	1 187	1 108	1 108
20×10	2 051	2 103	2 059	1 833	1 709	1 741	1 654	1 601	1 594
50×5	3 188	3 397	3 869	3 089	2 869	2 864	2 805	2 782	2 782
50×10	3 845	3 945	4 088	3 927	3 450	3 436	3 426	3 100	3 091
100×5	6 157	5 897	6 440	5 821	5 558	5 563	5 453	5 322	5 328
100×10	6 930	6 915	7 494	6 564	6 313	6 288	6 256	5 864	5 845
200×10	12 274	12 379	13 359	12 155	11 361	11 899	11 336	10 716	10 976
平均统计	5 121	5 164	5 570	4 961	4 636	4 715	4 588	4 356	4 355

N	M _k × K	FIFO	LIFO	LPT	SPT	ACO	GA	ABC-TS	Ours
50	(1, 1, 1, 1)	2 163	2 164	2 170	2 162	2 162	2 162	2 162	2 162
50	(2, 2, 2, 1)	1 122	1 117	1 118	1 123	1 109	1 109	1 109	1 109
50	(3, 3, 3, 1)	774	765	773	787	761	761	761	761
100	(1, 1, 1, 1)	6 514	6 508	6 544	6 537	6 475	6 474	6 474	6 454
100	(2, 2, 2, 1)	3 321	3 337	3 343	3 375	3 308	3 302	3 303	3 266
100	(3, 3, 3, 1)	2 278	2 251	2 269	2 301	2 217	2 226	2 213	2 204
200	(1, 1, 1, 1)	12 867	12 887	12 916	12 909	12 849	12 849	12 849	12 587
200	(2, 2, 2, 1)	6 578	6 492	6 516	6 522	6 472	6 464	6 463	6 415
200	(3, 3, 3, 1)	4 474	4 361	4 390	4 435	4 338	4 339	4 332	4 332
	平均统计	4 455	4 431	4 449	4 461	4 410	4 409	4 407	4 365

基于深度强化学习的智能车间调度方法研究

Research on deep reinforcement learning based intelligent shop scheduling method

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 34

相关文章 10

Metrics

推荐阅读 0

[1]	廖岑卉珊, 陈俊彦, 梁观平, 谢小兰, 卢小烨. 基于深度强化学习的SDN服务质量智能优化算法[J]. 物联网学报, 2023, 7(1): 73-82.
[2]	孙君, 赵尚维康. 工业物联网中基于Sarsa算法的节能计算卸载方案[J]. 物联网学报, 2022, 6(3): 82-90.
[3]	黄诺, 刘伟杰, 龚晨. 面向工业物联网的拍赫兹通信[J]. 物联网学报, 2022, 6(3): 37-46.
[4]	周鹏,徐金城,杨博. 工业物联网中基于边缘计算的跨域计算资源分配与任务卸载[J]. 物联网学报, 2020, 4(2): 96-104.
[5]	李一倩,刘留,李慧婷,张琨,袁泽. 工业物联网无线信道特性研究[J]. 物联网学报, 2019, 3(4): 34-47.
[6]	周毅,马晓勇,郜富晓,李伟,承楠,路宁. 基于深度强化学习的无人机自主部署及能效优化策略[J]. 物联网学报, 2019, 3(2): 47-55.
[7]	丁瑞金,高飞飞,邢玲. 基于深度强化学习的物联网智能路由策略[J]. 物联网学报, 2019, 3(2): 56-63.
[8]	龚淑蕾,李堃,童恩,郭洪德,周毅,王晔,丁飞. 基于蜂窝工业物联网的智能工厂解决方案[J]. 物联网学报, 2019, 3(2): 108-114.
[9]	吴超,王成群,朱升宏,徐伟强,贾宇波. 工业物联网中的缓冲内存管理设计与实现[J]. 物联网学报, 2019, 3(1): 60-64.
[10]	倪光南. 工业物联网安全与核心技术国产化[J]. 物联网学报, 2018, 2(2): 1-7.