| 引用本文: | 苏章圣,蒋胜龙,彭功状,梁越永.基于分层强化学习的天车调度优化方法[J].控制理论与应用,2025,42(11):2261~2273.[点击复制] |
| SU Zhang-sheng,JIANG Sheng-long,PENG Gong-zhaung,LIANG Yue-yong.Hierarchical reinforcement learning-based optimization method for crane scheduling[J].Control Theory & Applications,2025,42(11):2261~2273.[点击复制] |
|
| 基于分层强化学习的天车调度优化方法 |
| Hierarchical reinforcement learning-based optimization method for crane scheduling |
| 摘要点击 2228 全文点击 152 投稿时间:2025-04-17 修订日期:2025-10-29 |
| 查看全文 查看/发表评论 下载PDF阅读器 HTML |
| DOI编号 10.7641/CTA.2025.50167 |
| 2025,42(11):2261-2273 |
| 中文关键词 天车调度 分层强化学习 任务指派 路径规划 动作禁忌 |
| 英文关键词 crane scheduling hierarchical reinforcement learning task assignment path planning action tabu |
| 基金项目 国家自然科学基金项目(92367106,62273032,61873042)资助. |
|
| 中文摘要 |
| 天车是车间、仓库、港口等工业园区中关键的重型物料运载装置,其调度对运输效率及生产目标达成影响
显著.针对带时间窗约束的天车调度问题(CSP-TW),建立了一种基于时空离散化的混合整数线性规划模型.基于模
型特征,设计了一种分层强化学习(HRL)决策框架:上层决策网络将运输任务指派至适合天车;下层决策网络规划
各天车运行路径执行具体运输.在学习过程中,引入了动作禁忌规则规避无效动作,引导上下层决策网络向优势策
略空间探索.随后,采用了外部经验池和D3QN策略训练决策网络.基于某公司钢厂物流仿真平台进行测试:消融实
验表明所引入的动作禁忌规则可提高HRL学习效率;训练比较表明HRL的收敛性优于端到端框架;对比实验表明
HRL的求解效果优于多规则组合法、元启发式算法、端到端和DQN等多类方法,并符合秒级响应的应用需求. |
| 英文摘要 |
| Cranes are key heavy-duty material handling equipment widely used in shops, warehouses, ports, and other
industrial settings. The scheduling of cranes significantly affects transportation efficiency and the achievement of produc
tion goals. To address the crane scheduling problem with time windows (CSP-TW), a mixed-integer linear programming
model based on spatio-temporal discretization is developed. Based on the characteristics of the model, a hierarchical rein
forcement learning (HRL) decision-making framework is designed. The high-level decision network assigns transportation
tasks to appropriate cranes, while the low-level network plans paths for each crane to complete its assigned task. During
the learning process, action tabu rules are introduced to avoid ineffective actions and guide the decision networks toward
the dominant policy space. Subsequently, external experience pooling and the dueling double deep Q-network strategy
are adopted to train the decision networks. Tests were executed based on the logistics simulation platform of a steel
plant from a certain company. Ablation experiments show that the introduction of action tabu rules improves learning
efficiency.Training comparisons indicate that HRL achieves better convergence than the end-to-end framework. Compar
ative experiments demonstrate that HRL outperforms several methods, including multi-rule combinations, meta-heuristic
algorithms, end-to-end and deep Q-network, while satisfying second-level response-time requirements for applications. |
|
|
|
|
|