基于Q-learning的高速铁路列车动态调度方法

韩忻辰; 俞胜平; 袁志明; 程丽娟

引用本文:	韩忻辰,俞胜平,袁志明,程丽娟.基于Q-learning的高速铁路列车动态调度方法[J].控制理论与应用,2021,38(10):1511~1521.[点击复制]
	HAN Xin-chen,YU Sheng-ping,YUAN Zhi-ming,CHENG Li-juan.High-speed railway dynamic scheduling based on Q-learning method[J].Control Theory and Technology,2021,38(10):1511~1521.[点击复制]

基于Q-learning的高速铁路列车动态调度方法

High-speed railway dynamic scheduling based on Q-learning method

摘要点击 2054 全文点击 633 投稿时间：2020-09-10 修订日期：2021-09-09

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/CTA.2021.00612

2021,38(10):1511-1521

中文关键词高速铁路列车动态调度强化学习 Q-learning

英文关键词 high-speed railway dynamic scheduling reinforcement learning Q-learning

基金项目国家自然科学基金项目(U1834211, 61790574, 61603262, 61773269), 辽宁省自然科学基金(2020–MS–093)资助.

作者	单位	E-mail
韩忻辰	东北大学	xinchenhan@126.com
俞胜平^*	东北大学	spyu@mail.neu.edu.cn
袁志明	中国铁道科学研究院集团有限公司通信信号研究所
程丽娟	东北大学

中文摘要

高速铁路作为国家综合交通运输体系的骨干核心, 近十年来取得了飞速蓬勃的发展. 其飞速发展的同时也引发了路网复杂化、分布区域广等现象, 这些现象对高铁动态调度提出了更高的要求. 突发事件的不确定性会对列车造成时间延误影响, 甚者时间延误会沿路网传播, 造成大面积列车到发晚点. 而目前对于此问题的人工调度方式, 前瞻性及针对性较差, 难以对受影响列车进行迅速调整. 针对上述问题, 本文建立了以各列车在各车站延误时间总和最小为目标函数的高速铁路列车动态调度模型, 在此基础上设计了用于与智能体交互的仿真环境, 采用了强化学习中的Q-learning算法对模型进行求解. 最后通过仿真实例验证了仿真环境的合理性以及Q-learning算法用于高铁动态调度的有效性, 为高铁调度员做出优化决策提供了良好的依据.

英文摘要

As the backbone of the national comprehensive transportation system, high-speed railway has achieved rapid and vigorous development in the past decade. At the same time, its rapid development has also caused the phenomena of complicated road networks and wide distribution areas. These phenomena have placed higher requirements on highspeed railway scheduling. Unexpected events will affect the time delay of trains, and even the delay time will spread along the road network, causing large-area trains to arrive or departure late. However, the manual scheduling method is poorly forward-looking and pertinent, and it is difficult to quickly adjust the affected trains. In view of the above problems, this paper establishes a high-speed railway dynamic scheduling model with the minimum the sum of delay time as the objective function. Based on this model, an environment for interacting with the agent is designed, and the Q-learning algorithm is used to solve the model. Finally, the simulation examples verify the rationality of the simulation environment and the effectiveness of the Q-learning algorithm for the dynamic scheduling problem. It can provide a good basis for dispatchers to make more optimal decisions.