引用本文:韩忻辰,俞胜平,袁志明,程丽娟.基于Q-learning的高速铁路列车动态调度方法[J].控制理论与应用,2021,38(10):1511~1521.[点击复制]
HAN Xin-chen,YU Sheng-ping,YUAN Zhi-ming,CHENG Li-juan.High-speed railway dynamic scheduling based on Q-learning method[J].Control Theory and Technology,2021,38(10):1511~1521.[点击复制]
基于Q-learning的高速铁路列车动态调度方法
High-speed railway dynamic scheduling based on Q-learning method
摘要点击 2054  全文点击 633  投稿时间:2020-09-10  修订日期:2021-09-09
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/CTA.2021.00612
  2021,38(10):1511-1521
中文关键词  高速铁路列车  动态调度  强化学习  Q-learning
英文关键词  high-speed railway  dynamic scheduling  reinforcement learning  Q-learning
基金项目  国家自然科学基金项目(U1834211, 61790574, 61603262, 61773269), 辽宁省自然科学基金(2020–MS–093)资助.
作者单位E-mail
韩忻辰 东北大学 xinchenhan@126.com 
俞胜平* 东北大学 spyu@mail.neu.edu.cn 
袁志明 中国铁道科学研究院集团有限公司通信信号研究所  
程丽娟 东北大学  
中文摘要
      高速铁路作为国家综合交通运输体系的骨干核心, 近十年来取得了飞速蓬勃的发展. 其飞速发展的同时也 引发了路网复杂化、分布区域广等现象, 这些现象对高铁动态调度提出了更高的要求. 突发事件的不确定性会对列 车造成时间延误影响, 甚者时间延误会沿路网传播, 造成大面积列车到发晚点. 而目前对于此问题的人工调度方式, 前瞻性及针对性较差, 难以对受影响列车进行迅速调整. 针对上述问题, 本文建立了以各列车在各车站延误时间总 和最小为目标函数的高速铁路列车动态调度模型, 在此基础上设计了用于与智能体交互的仿真环境, 采用了强化学 习中的Q-learning算法对模型进行求解. 最后通过仿真实例验证了仿真环境的合理性以及Q-learning算法用于高铁 动态调度的有效性, 为高铁调度员做出优化决策提供了良好的依据.
英文摘要
      As the backbone of the national comprehensive transportation system, high-speed railway has achieved rapid and vigorous development in the past decade. At the same time, its rapid development has also caused the phenomena of complicated road networks and wide distribution areas. These phenomena have placed higher requirements on highspeed railway scheduling. Unexpected events will affect the time delay of trains, and even the delay time will spread along the road network, causing large-area trains to arrive or departure late. However, the manual scheduling method is poorly forward-looking and pertinent, and it is difficult to quickly adjust the affected trains. In view of the above problems, this paper establishes a high-speed railway dynamic scheduling model with the minimum the sum of delay time as the objective function. Based on this model, an environment for interacting with the agent is designed, and the Q-learning algorithm is used to solve the model. Finally, the simulation examples verify the rationality of the simulation environment and the effectiveness of the Q-learning algorithm for the dynamic scheduling problem. It can provide a good basis for dispatchers to make more optimal decisions.