引用本文:包海著,潘全科.基于深度强化学习的动态级联车间能效调度优化[J].控制理论与应用,2025,42(11):2310~2321.[点击复制]
BAO Hai-zhu,PAN Quan-ke.Dynamic cascading workshop energy-efficiency scheduling optimization based on deep reinforcement learning[J].Control Theory & Applications,2025,42(11):2310~2321.[点击复制]
基于深度强化学习的动态级联车间能效调度优化
Dynamic cascading workshop energy-efficiency scheduling optimization based on deep reinforcement learning
摘要点击 3205  全文点击 140  投稿时间:2025-03-26  修订日期:2025-10-13
查看全文  查看/发表评论  下载PDF阅读器   HTML
DOI编号  10.7641/CTA.2025.50119
  2025,42(11):2310-2321
中文关键词  级联车间调度  流水车间  动态调度  深度强化学习  图神经网络
英文关键词  cascading workshop scheduling  flow shop  dynamic scheduling  deep reinforcement learning  graph neural network
基金项目  国家自然科学基金项目(62273221),航空科学基金项目(2024Z0710S6002)资助.
作者单位E-mail
包海著 上海大学 机电工程与自动化学院 bhz1758695032@qq.com 
潘全科* 上海大学 机电工程与自动化学院 panquanke@shu.edu.cn 
中文摘要
      针对动态级联车间能效调度问题(DCSESP),本文研究了由分布式流水车间、混合流水车间及二者之间的 运输阶段组成的级联车间系统,旨在最小化订单总延迟时间和总能耗.为此,建立了混合整数线性规划模型,并提出 了一种基于图神经网络的深度强化学习(GDRL)算法.首先,针对级联车间调度问题,设计了一种异构图模型,构建 了三阶段节点嵌入方法,用于实时捕捉车间的动态状态特征;其次,依据提取的状态特征,分别对级联车间的两个阶 段执行工件和工序选择决策;最后,将多层感知机(MLP)和图注意力网络(GAT)结合于近端策略优化(PPO)算法的行 动者–评论者框架中,进行训练与学习,从而实现级联车间的快速联合调度.实验结果表明,GDRL算法在求解动态 级联车间能效调度问题时,显著优于其他3种先进的调度方法,尤其在复杂问题场景下表现出更强的优化能力和鲁 棒性.
英文摘要
      This study addresses the dynamic cascading workshop energy-efficiency scheduling problem (DCSESP) that consists of distributed flow shops (DFS), hybrid flow shops (HFS), and the transportation stage between them. The objective is to minimize the total tardiness and total energy consumption. To achieve this, a mixed-integer linear programming (MILP) model is developed, and a graph-based deep reinforcement learning (GDRL) algorithm is proposed. First, a heterogeneous graph model is designed for the DCSESP, and a three-stage node embedding method is introduced to capture the real-time shop floor state features. Second, based on these state features, the algorithm directly selects jobs and operations for both stages of the cascaded dual-shop system. Finally, a multilayer perceptron (MLP) and graph attention network (GAT) are integrated into the proximal policy optimization (PPO) algorithm with an actor-critic framework to facilitate learning and decision-making for rapid joint scheduling. The experimental results demonstrated that the proposed GDRL algorithm outperformed the three state-of-the-art scheduling methods in solving the DCSESP, particularly in complex scheduling scenarios, where it achieved higher optimization performance and robustness.