引用本文:杨晓宇,韩玉艳,王玉亭,李寰,张彪.融入流体松弛模型的决斗双深度Q网络求解动态多重柔性作业车间调度问题[J].控制理论与应用,2025,42(11):2332~2340.[点击复制]
YANG Xiao-yu,HAN Yu-yan,WANG Yu-ting,LI Huan,ZHANG Biao.A fluid relaxation model-integrated dueling double deep Q-network for solving the dynamic multiplicity flexible job-shop scheduling problem[J].Control Theory & Applications,2025,42(11):2332~2340.[点击复制]
融入流体松弛模型的决斗双深度Q网络求解动态多重柔性作业车间调度问题
A fluid relaxation model-integrated dueling double deep Q-network for solving the dynamic multiplicity flexible job-shop scheduling problem
摘要点击 3462  全文点击 179  投稿时间:2025-03-04  修订日期:2025-09-21
查看全文  查看/发表评论  下载PDF阅读器   HTML
DOI编号  10.7641/CTA.2025.50083
  2025,42(11):2332-2340
中文关键词  柔性制造系统  流体松弛模型  深度强化学习  多策略决斗双深度Q网络  多重性
英文关键词  flexible manufacturing systems  simplified fluid relaxation model  deep reinforcement learning  multi policy dueling double deep Q-network  multiplicity
基金项目  国家自然科学基金项目(61973203,61803192,62106073, 61966012), 山东省自然科学基金项目(ZR2023MF022,ZR2024MF112), 聊城大学光岳青 年创新团队项目(LCUGYTD2022–03)资助.
作者单位E-mail
杨晓宇 聊城大学计算机学院 yang_xiaoyu01@163.com 
韩玉艳* 聊城大学计算机学院 hanyuyan@lcu-cs.com 
王玉亭 聊城大学计算机学院  
李寰 聊城大学计算机学院  
张彪 聊城大学计算机学院  
中文摘要
      鉴于生产过程中订单的随机到达可能导致调度方案无法实现最优,实时动态订单到达成为关键问题.针对 动态多重柔性作业车间调度问题(DMFJSP),提出了多策略决斗双深度Q网络(MPD3QN)求解方法.该问题考虑了工 件订单和多类型工件的动态到达,首先,为了降低DMFJSP的复杂性,提出了一个简化的流体松弛模型,并基于流体 模型设计了多指标选择策略,用于辅助生产调度决策.其次,进一步构建了马尔可夫决策过程(MDP)模型,提取了与 工件和机器相关的19个状态特征,设计了20种复合规则作为动作空间.然后,结合优先经验回放、软更新机制和自 适应动作选择策略,提出了MPD3QN算法.最后,通过81个测试算例,将所提算法与3种现有的深度强化学习调度方 法进行比较,仿真结果验证了其优越性.
英文摘要
      Due to the random arrival of orders during production, companies may face challenges that result in the execution of a scheduling plan that is not optimal. The real-time dynamic arrival of orders is a critical issue. To address the dynamic multiplicity flexible job-shop scheduling problem (DMFJSP), which involves dynamic arrivals of job orders and multiple job types, a multi-policy dueling double deep Q-network (MPD3QN) solution is proposed. Firstly, to reduce the complexity of DMFJSP, a simplified fluid relaxation model is introduced and a multi-criteria selection strategy is developed based on this model to aid production scheduling decisions. Secondly, a Markov decision process (MDP) framework is constructed by extracting 19 state features related to jobs and machines, and 20 composite rules are designed to form the action space. Then, the MPD3QN algorithm is formulated through the integration of prioritised experience replay, a soft update mechanism, and an adaptive action selection strategy. Finally, the proposed method is evaluated through 81 test instances, and its performance is compared against three existing deep reinforcement learning scheduling approaches. Simulation results confirm its superior performance in scheduling efficiency and robustness.