引用本文:初阳,燕雪峰,张玄烨,徐云雯,李德伟.一类三维装箱问题的多智能体分层强化学习求解算法研究[J].控制理论与应用,2025,42(12):2569~2576.[点击复制]
CHU Yang,YAN Xue-feng,ZHANG Xuan-ye,XU Yun-wen,LI De-wei.Research on multi-agent hierarchical reinforcement learning algorithm for solving one type of 3D bin packing problem[J].Control Theory & Applications,2025,42(12):2569~2576.[点击复制]
一类三维装箱问题的多智能体分层强化学习求解算法研究
Research on multi-agent hierarchical reinforcement learning algorithm for solving one type of 3D bin packing problem
摘要点击 117  全文点击 16  投稿时间:2025-02-17  修订日期:2025-09-08
查看全文  查看/发表评论  下载PDF阅读器   HTML
DOI编号  10.7641/CTA.2025.50058
  2025,42(12):2569-2576
中文关键词  三维装箱问题  深度强化学习  多智能体强化学习  组合优化
英文关键词  three-dimensional packing problem  deep reinforcement learning  multi-agent reinforcement learning  combinatorial optimization
基金项目  
作者单位E-mail
初阳 南京航空航天大学计算机科学与技术学院 chuyang_716@163.com 
燕雪峰 南京航空航天大学计算机科学与技术学院  
张玄烨 上海交通大学自动化系  
徐云雯 上海交通大学自动化系  
李德伟* 上海交通大学自动化系 dwli@sjtu.edu.cn 
中文摘要
      针对半在线场景下的多箱体三维装箱问题(3D-BPP),为了提高装箱决策效率和装箱空间利用率,本文提出 一种多智能体分层强化学习算法.该算法采用多智能体马尔可夫决策过程(MAMDP)对问题进行建模,通过3个完全 合作的智能体分别负责货物选择、箱子选择和摆放位置规划,并引入值分布学习方法以增强算法的稳定性和收敛 性. 实验结果表明,该算法在不同环境配置下均表现出良好的性能,空间利用率和装入货物数量显著提升,且在多 箱体和多货物选择场景下展现出较强的泛化能力.与传统的启发式算法相比,该算法在动态决策和适应性方面具 有明显优势,尤其在处理未知分布的货物尺寸时表现出较强的鲁棒性.该算法首次将多智能体分层强化学习框架 应用于3D-BPP,实现装箱决策的端到端优化,为复杂装箱场景提供了一种新颖的解决方案.
英文摘要
      With consideration of the complexity of the three-dimensional bin packing problem (3D-BPP) in the multi bin semi-online scenarios, a multi-agent hierarchical reinforcement learning algorithm is proposed to improve packing efficiency and space utilization. The proposed algorithm models the problem by using a multi-agent Markov decision process (MAMDP), including three fully cooperative agents responsible for item selection, bin selection, and placement planning, respectively. A distributional learning method is introduced to enhance the stability and convergence of the algorithm. Experimental results demonstrate that the algorithm exhibits superior packing performance across various envi ronmental configurations, significantly improving space utilization and the number of packed items. It also shows strong generalization capabilities in multi-bin and multi-item selection scenarios. Compared to traditional heuristic algorithms, the proposed method has clear advantages in dynamic decision-making and adaptive optimization, particularly demonstrating robustness when handling items with unknown size distributions. The innovation lies in the first application of a multi-agent hierarchical reinforcement learning framework to the 3D-BPP, achieving end-to-end optimization of packing decisions and providing a novel solution for complex packing scenarios.