融入流体松弛模型的决斗双深度Q网络求解动态多重柔性作业车间调度问题

杨晓宇; 韩玉艳; 王玉亭; 李寰; 张彪

引用本文:	杨晓宇,韩玉艳,王玉亭,李寰,张彪.融入流体松弛模型的决斗双深度Q网络求解动态多重柔性作业车间调度问题[J].控制理论与应用,2025,42(11):2332~2340.[点击复制]
	YANG Xiao-yu,HAN Yu-yan,WANG Yu-ting,LI Huan,ZHANG Biao.A fluid relaxation model-integrated dueling double deep Q-network for solving the dynamic multiplicity flexible job-shop scheduling problem[J].Control Theory & Applications,2025,42(11):2332~2340.[点击复制]

融入流体松弛模型的决斗双深度Q网络求解动态多重柔性作业车间调度问题

A fluid relaxation model-integrated dueling double deep Q-network for solving the dynamic multiplicity flexible job-shop scheduling problem

摘要点击 3462 全文点击 179 投稿时间：2025-03-04 修订日期：2025-09-21

查看全文查看/发表评论下载PDF阅读器 HTML

DOI编号 10.7641/CTA.2025.50083

2025,42(11):2332-2340

中文关键词柔性制造系统流体松弛模型深度强化学习多策略决斗双深度Q网络多重性

英文关键词 flexible manufacturing systems simplified fluid relaxation model deep reinforcement learning multi policy dueling double deep Q-network multiplicity

基金项目国家自然科学基金项目(61973203,61803192,62106073, 61966012), 山东省自然科学基金项目(ZR2023MF022,ZR2024MF112), 聊城大学光岳青年创新团队项目(LCUGYTD2022–03)资助.

作者	单位	E-mail
杨晓宇	聊城大学计算机学院	yang_xiaoyu01@163.com
韩玉艳^*	聊城大学计算机学院	hanyuyan@lcu-cs.com
王玉亭	聊城大学计算机学院
李寰	聊城大学计算机学院
张彪	聊城大学计算机学院

中文摘要

鉴于生产过程中订单的随机到达可能导致调度方案无法实现最优,实时动态订单到达成为关键问题.针对动态多重柔性作业车间调度问题(DMFJSP),提出了多策略决斗双深度Q网络(MPD3QN)求解方法.该问题考虑了工件订单和多类型工件的动态到达,首先,为了降低DMFJSP的复杂性,提出了一个简化的流体松弛模型,并基于流体模型设计了多指标选择策略,用于辅助生产调度决策.其次,进一步构建了马尔可夫决策过程(MDP)模型,提取了与工件和机器相关的19个状态特征,设计了20种复合规则作为动作空间.然后,结合优先经验回放、软更新机制和自适应动作选择策略,提出了MPD3QN算法.最后,通过81个测试算例,将所提算法与3种现有的深度强化学习调度方法进行比较,仿真结果验证了其优越性.

英文摘要

Due to the random arrival of orders during production, companies may face challenges that result in the execution of a scheduling plan that is not optimal. The real-time dynamic arrival of orders is a critical issue. To address the dynamic multiplicity flexible job-shop scheduling problem (DMFJSP), which involves dynamic arrivals of job orders and multiple job types, a multi-policy dueling double deep Q-network (MPD3QN) solution is proposed. Firstly, to reduce the complexity of DMFJSP, a simplified fluid relaxation model is introduced and a multi-criteria selection strategy is developed based on this model to aid production scheduling decisions. Secondly, a Markov decision process (MDP) framework is constructed by extracting 19 state features related to jobs and machines, and 20 composite rules are designed to form the action space. Then, the MPD3QN algorithm is formulated through the integration of prioritised experience replay, a soft update mechanism, and an adaptive action selection strategy. Finally, the proposed method is evaluated through 81 test instances, and its performance is compared against three existing deep reinforcement learning scheduling approaches. Simulation results confirm its superior performance in scheduling efficiency and robustness.