基于深度强化学习的动态级联车间能效调度优化

包海著; 潘全科

引用本文:	包海著,潘全科.基于深度强化学习的动态级联车间能效调度优化[J].控制理论与应用,2025,42(11):2310~2321.[点击复制]
	BAO Hai-zhu,PAN Quan-ke.Dynamic cascading workshop energy-efficiency scheduling optimization based on deep reinforcement learning[J].Control Theory & Applications,2025,42(11):2310~2321.[点击复制]

基于深度强化学习的动态级联车间能效调度优化

Dynamic cascading workshop energy-efficiency scheduling optimization based on deep reinforcement learning

摘要点击 3205 全文点击 140 投稿时间：2025-03-26 修订日期：2025-10-13

查看全文查看/发表评论下载PDF阅读器 HTML

DOI编号 10.7641/CTA.2025.50119

2025,42(11):2310-2321

中文关键词级联车间调度流水车间动态调度深度强化学习图神经网络

英文关键词 cascading workshop scheduling flow shop dynamic scheduling deep reinforcement learning graph neural network

基金项目国家自然科学基金项目(62273221),航空科学基金项目(2024Z0710S6002)资助.

作者	单位	E-mail
包海著	上海大学机电工程与自动化学院	bhz1758695032@qq.com
潘全科^*	上海大学机电工程与自动化学院	panquanke@shu.edu.cn

中文摘要

针对动态级联车间能效调度问题(DCSESP),本文研究了由分布式流水车间、混合流水车间及二者之间的运输阶段组成的级联车间系统,旨在最小化订单总延迟时间和总能耗.为此,建立了混合整数线性规划模型,并提出了一种基于图神经网络的深度强化学习(GDRL)算法.首先,针对级联车间调度问题,设计了一种异构图模型,构建了三阶段节点嵌入方法,用于实时捕捉车间的动态状态特征;其次,依据提取的状态特征,分别对级联车间的两个阶段执行工件和工序选择决策;最后,将多层感知机(MLP)和图注意力网络(GAT)结合于近端策略优化(PPO)算法的行动者–评论者框架中,进行训练与学习,从而实现级联车间的快速联合调度.实验结果表明,GDRL算法在求解动态级联车间能效调度问题时,显著优于其他3种先进的调度方法,尤其在复杂问题场景下表现出更强的优化能力和鲁棒性.

英文摘要

This study addresses the dynamic cascading workshop energy-efficiency scheduling problem (DCSESP) that consists of distributed flow shops (DFS), hybrid flow shops (HFS), and the transportation stage between them. The objective is to minimize the total tardiness and total energy consumption. To achieve this, a mixed-integer linear programming (MILP) model is developed, and a graph-based deep reinforcement learning (GDRL) algorithm is proposed. First, a heterogeneous graph model is designed for the DCSESP, and a three-stage node embedding method is introduced to capture the real-time shop floor state features. Second, based on these state features, the algorithm directly selects jobs and operations for both stages of the cascaded dual-shop system. Finally, a multilayer perceptron (MLP) and graph attention network (GAT) are integrated into the proximal policy optimization (PPO) algorithm with an actor-critic framework to facilitate learning and decision-making for rapid joint scheduling. The experimental results demonstrated that the proposed GDRL algorithm outperformed the three state-of-the-art scheduling methods in solving the DCSESP, particularly in complex scheduling scenarios, where it achieved higher optimization performance and robustness.