一类三维装箱问题的多智能体分层强化学习求解算法研究

初阳; 燕雪峰; 张玄烨; 徐云雯; 李德伟

引用本文:	初阳,燕雪峰,张玄烨,徐云雯,李德伟.一类三维装箱问题的多智能体分层强化学习求解算法研究[J].控制理论与应用,2025,42(12):2569~2576.[点击复制]
	CHU Yang,YAN Xue-feng,ZHANG Xuan-ye,XU Yun-wen,LI De-wei.Research on multi-agent hierarchical reinforcement learning algorithm for solving one type of 3D bin packing problem[J].Control Theory & Applications,2025,42(12):2569~2576.[点击复制]

一类三维装箱问题的多智能体分层强化学习求解算法研究

Research on multi-agent hierarchical reinforcement learning algorithm for solving one type of 3D bin packing problem

摘要点击 3253 全文点击 134 投稿时间：2025-02-17 修订日期：2025-09-08

查看全文查看/发表评论下载PDF阅读器 HTML

DOI编号 10.7641/CTA.2025.50058

2025,42(12):2569-2576

中文关键词三维装箱问题深度强化学习多智能体强化学习组合优化

英文关键词 three-dimensional packing problem deep reinforcement learning multi-agent reinforcement learning combinatorial optimization

基金项目

作者	单位	E-mail
初阳	南京航空航天大学计算机科学与技术学院	chuyang_716@163.com
燕雪峰	南京航空航天大学计算机科学与技术学院
张玄烨	上海交通大学自动化系
徐云雯	上海交通大学自动化系
李德伟^*	上海交通大学自动化系	dwli@sjtu.edu.cn

中文摘要

针对半在线场景下的多箱体三维装箱问题(3D-BPP),为了提高装箱决策效率和装箱空间利用率,本文提出一种多智能体分层强化学习算法.该算法采用多智能体马尔可夫决策过程(MAMDP)对问题进行建模,通过3个完全合作的智能体分别负责货物选择、箱子选择和摆放位置规划,并引入值分布学习方法以增强算法的稳定性和收敛性. 实验结果表明,该算法在不同环境配置下均表现出良好的性能,空间利用率和装入货物数量显著提升,且在多箱体和多货物选择场景下展现出较强的泛化能力.与传统的启发式算法相比,该算法在动态决策和适应性方面具有明显优势,尤其在处理未知分布的货物尺寸时表现出较强的鲁棒性.该算法首次将多智能体分层强化学习框架应用于3D-BPP,实现装箱决策的端到端优化,为复杂装箱场景提供了一种新颖的解决方案.

英文摘要

With consideration of the complexity of the three-dimensional bin packing problem (3D-BPP) in the multi bin semi-online scenarios, a multi-agent hierarchical reinforcement learning algorithm is proposed to improve packing efficiency and space utilization. The proposed algorithm models the problem by using a multi-agent Markov decision process (MAMDP), including three fully cooperative agents responsible for item selection, bin selection, and placement planning, respectively. A distributional learning method is introduced to enhance the stability and convergence of the algorithm. Experimental results demonstrate that the algorithm exhibits superior packing performance across various envi ronmental configurations, significantly improving space utilization and the number of packed items. It also shows strong generalization capabilities in multi-bin and multi-item selection scenarios. Compared to traditional heuristic algorithms, the proposed method has clear advantages in dynamic decision-making and adaptive optimization, particularly demonstrating robustness when handling items with unknown size distributions. The innovation lies in the first application of a multi-agent hierarchical reinforcement learning framework to the 3D-BPP, achieving end-to-end optimization of packing decisions and providing a novel solution for complex packing scenarios.