基于DQN的列车节能驾驶控制方法

doi:10.11959/j.issn.2096-6652.202040

智能科学与技术学报 ›› 2020, Vol. 2 ›› Issue (4): 372-384.doi: 10.11959/j.issn.2096-6652.202040

基于DQN的列车节能驾驶控制方法

宿帅¹, 朱擎阳¹, 魏庆来², 唐涛¹, 阴佳腾¹

¹ 北京交通大学轨道交通控制与安全国家重点实验室，北京 100044
² 中国科学院自动化研究所复杂系统管理与控制国家重点实验室，北京 100190

修回日期:2020-12-02 出版日期:2020-12-15 发布日期:2020-12-01
作者简介:宿帅（1987- ），男，博士，北京交通大学轨道交通控制与安全国家重点实验室副教授，主要研究方向为列车节能控制、列车智能调度、强化学习算法等。
朱擎阳（1996- ），男，北京交通大学轨道交通控制与安全国家重点实验室硕士生，主要研究方向为列车节能驾驶控制、强化学习算法等。
魏庆来（1979- ），男，博士，中国科学院自动化研究所研究员，复杂系统管理与控制国家重点实验室副主任，中国科学院大学岗位教授，青岛智能产业技术研究院智能技术创新中心主任，主要研究方向为自学习控制、平行控制自适应动态规划、智能控制、最优控制及其工业应用。
唐涛（1963- ），男，博士，北京交通大学轨道交通控制与安全国家重点实验室主任，电子信息工程学院院长、教授，主要研究方向为高速铁路控制、智能控制理论、智能交通理论等。
阴佳腾（1992- ），男，博士，北京交通大学轨道交通控制与安全国家重点实验室副教授，主要研究方向为列车智能控制与调度、机器学习、列车节能驾驶控制等。
基金资助:
国家自然科学基金资助项目(61803021);国家自然科学基金资助项目(U1734210);北京市自然科学基金资助项目(L191015)

A DQN-based approach for energy-efficient train driving control

Shuai SU¹, Qingyang ZHU¹, Qinglai WEI², Tao TANG¹, Jiateng YIN¹

¹ State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing 100044, China
² The State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

Revised:2020-12-02 Online:2020-12-15 Published:2020-12-01
Supported by:
The National Natural Science Foundation of China(61803021);The National Natural Science Foundation of China(U1734210);The Natural Science Founda-tion of Beijing of China(L191015)

摘要/Abstract

摘要：

随着轨道交通网络规模的扩大和列车运营间隔的缩短，列车牵引能耗在快速增加。因此，通过优化列车的驾驶策略降低牵引能耗，对于轨道交通系统的节能减排具有重大意义。针对列车的驾驶策略优化问题，提出一种基于深度 Q 网络（DQN）的列车节能驾驶控制方法。首先介绍了传统的列车节能驾驶问题并构造其反问题，即通过分配最少的能耗达到规定运行时分。进一步将该问题转化为有限马尔可夫决策过程（MDP），通过设计状态动作值函数、定义动作策略选取方法等，构建基于 DQN 方法的列车节能驾驶控制方法。通过实际驾驶数据对DQN 进行训练，得到最优的状态动作值函数，并通过该值函数确定最优的能耗分配方案，从而得到最优驾驶策略。最后，以北京地铁亦庄线的实际运营数据设计了仿真算例，对方法的有效性进行验证，并对方法参数进行了敏感度分析。提出的方法可充分利用列车的驾驶数据提升驾驶策略，降低列车牵引能耗，对未来我国智慧城轨的发展具有一定的借鉴意义。

关键词: 列车节能驾驶, 驾驶策略, 深度Q网络

Abstract:

The energy consumption in railway system is growing rapidly due to the expanding scale of the railway network and decreased operational headway.Hence, it is of great significant to apply the energy-efficient operation of the vehicles to cut down the energy cost of the railway system.A method for solving the energy-efficient train driving control based on deep Q-network (DQN) approach was proposed.Firstly, the traditional energy-efficient train driving control problem was presented and its inverse problem was formulated, i.e., distributing the least energy consumption units to achieve the scheduled trip time.Moreover, the problem was reformulated as a Markov decision process (MDP) and a DQN-based approach for energy-efficient train driving control was proposed.A DQN was built to approximate the action value function which determines the optimal energy distribution policy and further obtain the optimal driving strategy.Finally, a numerical experiment based on the real-world operational data was proposed to verify the effectiveness of the proposed method and analyze the performance of the proposed method.The driving data of the trains is applied to improve the driving strategy via the proposed method in the paper which reduces the traction energy consumption.It is of significance for the future development of Chinese intelligent urban railway system.

Key words: energy-efficient train driving, driving strategy, deep Q-network

中图分类号:

U238.4

宿帅, 朱擎阳, 魏庆来, 等. 基于DQN的列车节能驾驶控制方法[J]. 智能科学与技术学报, 2020, 2(4): 372-384.

Shuai SU, Qingyang ZHU, Qinglai WEI, et al. A DQN-based approach for energy-efficient train driving control[J]. Chinese Journal of Intelligent Science and Technology, 2020, 2(4): 372-384.

图/表 12

表1

列车节能驾驶控制工况"

松弛项取值范围	相对牵引力相对制动力工况
$\frac{L_{1}}{m v} ＜ 0$	u_f = 0u_b= 1MB
$\frac{L_{1}}{m v} = 0$	u_f = 0u_b∈[0,1]部分制动（partial braking，PB）
$\frac{L_{1}}{m v} \in [0, \frac{1}{μ_{\max}}]$	u_f = 0u_b= 0C
$\frac{L_{1}}{m v} ＞ \frac{1}{μ_{\min}}$	u_f = 1u_b= 0MA
$\frac{L_{1}}{m v} \in [\frac{1}{μ_{\max}}, \frac{1}{μ_{\min}}]$	u_f =1或0u_b= 0MA或C

表1

图1

图2

图3

表2

表3

图4

图5

表4

图6

图7

图8

参考文献 53

[1]	中国城市轨道交通协会. 2019年度中国内地城轨交通线路概况[R]. 2020.
	China Association of Metro. An overview of urban rail transit lines in the mainland of China in 2019[R]. 2020.
[2]	褚心童, 张亚东, 郭进 ,等. 基于蚁群算法的列车节能驾驶策略优化算法研究[J]. 铁道标准设计, 2020,65(2): 1-7.
	CHU X T , ZHANG Y D , GUO J ,et al. An optimization algorithm for train energy-saving driving strategy based on ant colony algorithm[J]. Railway Standard Design, 2020,65(2): 1-7.
[3]	YANG X , LI X , NING B ,et al. A survey on energy-efficient train operation for urban rail transit[J]. IEEE Transactions on Intelligent Transportation Systems, 2015,17(1): 2-13.
[4]	张钹 . 人工智能进入后深度学习时代[J]. 智能科学与技术学报, 2019,1(1): 4-6.
	ZHANG B . Artificial intelligence is entering the post deep-learning era[J]. Chinese Journal of Intelligent Science and Technology, 2019,1(1): 4-6.
[5]	袁小锋, 王雅琳, 阳春华 ,等. 深度学习在流程工业过程数据建模中的应用[J]. 智能科学与技术学报, 2020,2(2): 107-115.
	YUAN X F , WANG Y L , YANG C H ,et al. The application of deep learning in data-driven modeling of process industries[J]. Chinese Journal of Intelligent Science and Technology, 2020,2(2): 107-115.
[6]	ICHIKAWA K . Application of optimization theory for bounded state variable problems to the operation of train[J]. Bulletin of JSME, 1968,11(47): 857-865.
[7]	HOWLETT P . Optimal strategies for the control of a train[J]. Automatica, 1996,32(4): 519-532.
[8]	HOWLETT P . An optimal strategy for the control of a train[J]. The Anziam Journal, 1990,31(4): 454-471.
[9]	LIU R R , GOLOVITCHER I M . Energy-efficient operation of rail vehicles[J]. Transportation Research Part A:Policy and Practice, 2003,37(10): 917-932.
[10]	KHMELNITSKY E . On an optimal control problem of train operation[J]. IEEE Transactions on Automatic Control, 2000,45(7): 1257-1266.
[11]	曲健伟, 王青元, 孙鹏飞 . 基于极大值原理的地铁列车节能驾驶简化算法[J]. 铁道科学与工程学报, 2019,16(6): 1577-1586.
	QU J W , WANG Q Y , SUN P F . A simplified algorithm of energy-efficient driving for metro trains based on the maximum principle[J]. Journal of Railway Science and Engineering, 2019,16(6): 1577-1586.
[12]	ARADI S , BECSI T , GASPAR P . A predictive optimization method for energy-optimal speed profile generation for train[C]// 14th IEEE International Symposium on Computational Intelligence and Informatics. Piscataway:IEEE Press, 2013: 135-139.
[13]	WANG Y H , SCHUTTER D , NING B ,et al. Optimal trajectory planning for trains - a pseudo spectral method and a mixed integer linear programming approach[J]. Transportation Research Part C:Emerging Technologies, 2013,29: 97-114.
[14]	SCHEEPMAKER G M , GOVERDE R M P . Energy-efficient train control including regenerative braking with catenary efficiency[C]// 2016 IEEE International Conference on Intelligent Rail Transportation. Piscataway:IEEE Press, 2016: 116-122.
[15]	ALBRECHT A , HOWLETT P , PUDNEY P ,et al. The key principles of optimal train control—part 1:formulation of the model,strategies of optimal type,evolutionary lines,location of optimal switching points[J]. Transportation Research Part B:Methodological, 2016,94: 482-508.
[16]	ALBRECHT A , HOWLETT P , PUDNEY P ,et al. The key principles of optimal train control—part 2:existence of an optimal strategy,the local energy minimization principle,uniqueness,computational techniques[J]. Transportation Research Part B:Methodological, 2016,94: 509-538.
[17]	MORRIS B , FEDERICA F , MICHELA L . Application of genetic algorithms for driverless subway train energy optimization[J]. International Journal of Vehicular Technology, 2016: 1-14.
[18]	DOMNGUEZ M , FERNáNDEZ A , CUCALA A P ,et al. Optimal design of metro automatic train operation speed profiles for reducing energy consumption[J]. Proceedings of the Institution of Mechanical Engineers,Part F:Journal of Rail and Rapid Transit, 2011,225(5): 463-474.
[19]	DOMNGUEZ M , FERNáNDEZ A , CUCALA A P ,et al. Energy savings in metropolitan railway substations through regenerative energy recovery and optimal design of ATO speed profiles[J]. IEEE Transactions on Automation Science and Engineering, 2012,9(3): 496-504.
[20]	ZHANG C Y , CHEN D W , YIN J T ,et al. Data-driven train operation models based on data mining and driving experience for the diesel-electric locomotive[J]. Advanced Engineering Informatics, 2016,30(3): 553-563.
[21]	LIU X , DAI S H . Optimization of train control strategy for energy saving and time precision using multi-objective cuckoo search algorithm[C]// The 2nd International Conference on Computer Science and Application Engineering.[S.l.:s.n.], 2018: 1-5.
[22]	CHENG R J , CHEN D W , GAI W L ,et al. Intelligent driving methods based on sparse LSSVM and ensemble CART algorithms for high-speed trains[J]. Computers ＆ Industrial Engineering, 2019,127: 1203-1213.
[23]	厉高, 林建辉, 庄哲 ,等. 基于再生制动的城市轨道列车节能控制研究[J]. 铁道运输与经济, 2019,41(3): 121-126.
	LI G , LIN J H , ZHUANG Z ,et al. A research on energy-saving train control of urban mass transit based on regenerative brake[J]. Railway Transport and Economy, 2019,41(3): 121-126.
[24]	HUANG Y N , TAN L T , CHEN L ,et al. A neural network driving curve generation method for the heavy-haul train[J]. Advances in Mechanical Engineering, 2016,8(5): 1-14.
[25]	YIN J T , CHEN D W , LI L X . Intelligent train operation algorithms for subway by expert system and reinforcement learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2014,15(6): 2561-2571.
[26]	ZHOU R , SONG S J . Optimal automatic train operation via deep reinforcement learning[C]// The 10th International Conference on Advanced Computational Intelligence.[S.l.:s.n.], 2018: 103-108.
[27]	HUANG K , WU J J , YANG X ,et al. Discrete train speed profile optimization for urban rail transit:a data-driven model and integrated algorithms based on machine learning[J]. Journal of Advanced Transportation, 2019,4: 1-17.
[28]	HUANG J , ZHANG E D , ZHANG J R ,et al. Deep reinforcement learning based train driving optimization[C]// 2019 Chinese Automation Congress.[S.l.:s.n.] 2019.
[29]	ZHOU K C , SONG S J , XUE A K ,et al. Smart train operation algorithms based on expert knowledge and reinforcement learning[J]. IEEE Transactions on Systems,Man,and Cybernetics:Systems, 2020: 1-12.
[30]	QI X W , LUO Y D , WU G Y ,et al. Deep reinforcement learning enabled self-learning control for energy efficient driving[J]. Transportation Research Part C:Emerging Technologies, 2019,99: 67-81.
[31]	张淼, 张琦, 张梓轩 . 基于Q学习算法的高速铁路列车节能优化研究[J]. 铁道运输与经济, 2019,41(12): 111-117.
	ZHANG M , ZHANG Q , ZHANG Z X . A study on energy-saving optimization for high-speed railways train based on Q-learning algorithm[J]. Railway Transport and Economy, 2019,41(12): 111-117.
[32]	SU S , TANG T , LI X . Driving strategy optimization for trains in subway systems[J]. Proceedings of the Institution of Mechanical Engineers,Part F:Journal of Rail and Rapid Transit, 2018,232(2): 369-383.
[33]	SCHEEPMAKER G M , GOVERDE R M P , KROON L G . Review of energy-efficient train control and timetabling[J]. European Journal of Operational Research, 2017,257(2017): 355-376.
[34]	唐伦, 贺小雨, 王晓 ,等. 基于迁移演员-评论家学习的服务功能链部署算法[J]. 电子与信息学报, 2020,42(11): 2671-2679.
	TANG L , HE X Y , WANG X ,et al. Deployment algorithm of service function chain based ontransfer actor-critic learning[J]. Journal of Electronics ＆ Information Technology, 2020,42(11): 2671-2679.
[35]	YUAN Y L , YU Z L , GU Z H ,et al. A novel multi-step Q-learning method to improve data efficiency for deep reinforcement learning[J]. Knowledge-Based Systems, 2019,175: 107-117.
[36]	HEINRICH J , SILVER D . Deep reinforcement learning from self-play in imperfect-information games[J]. arXiv preprint, 2016,arXiv:1603.01121.
[37]	SUTTON R , BARTO A . Reinforcement learning:an introduction[M]. Cambridge: MIT Press, 1998.
[38]	王飞跃, 曹东璞, 魏庆来 . 强化学习:迈向知行合一的智能机制与算法[J]. 智能科学与技术学报, 2020,2(2): 101-106.
	WANG F Y , CAO D P , WEI Q L . Reinforcement learning:toward action-knowledge merged intelligent mechanisms and algorithms[J]. Chinese Journal of Intelligent Science and Technology, 2020,2(2): 101-106.
[39]	孙长银, 穆朝絮 . 多智能体深度强化学习的若干关键科学问题[J]. 自动化学报, 2020,46(7): 1301-1312.
	SUN C Y , MU C X . Important scientific problems of multi-agent deep reinforcement learning[J]. Acta Automatica Sinica, 2020,46(7): 1301-1312.
[40]	VOLODYMYR M , KORAY K , DAVID S ,et al. Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540): 529-533.
[41]	VOLODYMYR M , KORAY K , DAVID S ,et al. Playing atari with deep reinforcement learning[J]. arXiv preprint, 2013,arXiv:1312.5602.
[42]	张俊杰, 张聪, 赵涵捷 . 重复利用状态值的竞争深度Q网络算法[J]. 计算机工程与应用, 2020,56(21): 1-10.
	ZHANG J J , ZHANG C , ZHAO H J . Dueling deep Q network algorithm with state value reuse[J]. Computer Engineering and Applications, 2020,56(21): 1-10.
[43]	YANG Z R , XIE Y C , WANG Z R ,et al. A theoretical analysis of deep Q-learning[J]. Learning for Dynamics and Control, 2020: 486-489.
[44]	ZHAO W Y , GUAN X Y , LIU Y ,et al. Stochastic variance reduction for deep Q-learning[J]. arXiv preprint, 2019,arXiv:1905.08152.
[45]	VAN HASSELT H , GUEZ A , SILVER D . Deep reinforcement learning with double Q-learning[J]. arXiv preprint, 2015,arXiv:1509.06461.
[46]	OSBAND I , BLUNDELL C , PRITZEL A ,et al. Deep exploration via bootstrapped DQN[J]. Advances in Neural Information Processing Systems, 2016: 4026-4034.
[47]	HASSELT H V , WIERING M A . Reinforcement learning in continuous action spaces[C]// 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning. Piscataway:IEEE Press, 2007: 272-279.
[48]	SU S , LI X , TANG T ,et al. A subway train timetable optimization approach based on energy-efficient operation strategy[J]. IEEE Transactions on Intelligent Transportation Systems, 2013,14(2): 883-893.
[49]	宿帅 . 城轨列车运行图和速度曲线一体化节能方法[D]. 北京:北京交通大学, 2016.
	SU S . An energy-efficient approach by integrating the train timetable and speed trajectory in urban rail[D]. Beijing:Beijing Jiaotong University, 2016.
[50]	LIU W T , TANG T , SU S ,et al. Energy-efficient train driving strategy with considering the steep downhill segment[J]. Processes, 2019,7(2): 77.
[51]	GOODFELLOW I , BENGIO Y , COURVILLE A . Deep learning[M].[S.l.]: MIT press, 2016.
[52]	沈宇, 韩金朋, 李灵犀 ,等. 游戏智能中的 AI-从多角色博弈到平行博弈[J]. 智能科学与技术学报, 2020,2(3): 205-213.
	SHEN Y , HAN J P , LI L X ,et al. AI in game intelligence-from multi-role game to parallel game[J]. Chinese Journal of Intelligent Science and Technology, 2020,2(3): 205-213.
[53]	郑南宁 . 人工智能新时代[J]. 智能科学与技术学报, 2019,1(1): 1-3.
	ZHENG N N . The new era of artificial intelligence[J]. Chinese Journal of Intelligent Science and Technology, 2019,1(1): 1-3.

参数	取值	参数	取值
站间长度	1 350 m	列车质量	200 t
最大牵引力	250 kN	最大制动力	120 kN
线路最大限速	22 m/s	能量单元	0.5 kWh
初始策略运行时分	141.13 s	初始策略牵引能耗	5.62 kWh

规定运行时分/s	求解方法	实际运行时分/s	牵引能耗/kWh
100	DQN方法	99.14	11.62
	数值法	99.14	11.62
98	DQN方法	97.80	12.12
	数值法	97.80	12.12
96	DQN方法	95.50	13.12
	数值法	95.50	13.12

站间	规定运行时分/s	实际运行时分/s		实际牵引能耗/kWh
站间	规定运行时分/s	数值法	DQN方法	数值法	DQN方法
宋家庄—肖村	190	185.46	166.54	11.46	14.51
肖村—小红门	108	105.82	111.16	7.81	6.92
小红门—旧宫	157	154.51	172.77	12.85	8.94
总计	455	445.79	450.47	32.12	30.37

基于DQN的列车节能驾驶控制方法

A DQN-based approach for energy-efficient train driving control

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 53

相关文章 2

Metrics

推荐阅读 0

[1]	郁洲, 毕敬, 苑海涛. 基于改进DQN算法的复杂海战场路径规划方法[J]. 智能科学与技术学报, 2022, 4(3): 418-425.
[2]	项羽铭, 陈焜, 赵志峰, 李荣鹏, 张宏纲. 脑注意力机制启发的群体智能协同避障方法[J]. 智能科学与技术学报, 2022, 4(1): 84-96.