| 引用本文: | 初阳,燕雪峰,张玄烨,徐云雯,李德伟.一类三维装箱问题的多智能体分层强化学习求解算法研究[J].控制理论与应用,2025,42(12):2569~2576.[点击复制] |
| CHU Yang,YAN Xue-feng,ZHANG Xuan-ye,XU Yun-wen,LI De-wei.Research on multi-agent hierarchical reinforcement learning algorithm for solving one type of 3D bin packing problem[J].Control Theory & Applications,2025,42(12):2569~2576.[点击复制] |
|
| 一类三维装箱问题的多智能体分层强化学习求解算法研究 |
| Research on multi-agent hierarchical reinforcement learning algorithm for solving one type of 3D bin packing problem |
| 摘要点击 117 全文点击 16 投稿时间:2025-02-17 修订日期:2025-09-08 |
| 查看全文 查看/发表评论 下载PDF阅读器 HTML |
| DOI编号 10.7641/CTA.2025.50058 |
| 2025,42(12):2569-2576 |
| 中文关键词 三维装箱问题 深度强化学习 多智能体强化学习 组合优化 |
| 英文关键词 three-dimensional packing problem deep reinforcement learning multi-agent reinforcement learning combinatorial optimization |
| 基金项目 |
|
| 中文摘要 |
| 针对半在线场景下的多箱体三维装箱问题(3D-BPP),为了提高装箱决策效率和装箱空间利用率,本文提出
一种多智能体分层强化学习算法.该算法采用多智能体马尔可夫决策过程(MAMDP)对问题进行建模,通过3个完全
合作的智能体分别负责货物选择、箱子选择和摆放位置规划,并引入值分布学习方法以增强算法的稳定性和收敛
性. 实验结果表明,该算法在不同环境配置下均表现出良好的性能,空间利用率和装入货物数量显著提升,且在多
箱体和多货物选择场景下展现出较强的泛化能力.与传统的启发式算法相比,该算法在动态决策和适应性方面具
有明显优势,尤其在处理未知分布的货物尺寸时表现出较强的鲁棒性.该算法首次将多智能体分层强化学习框架
应用于3D-BPP,实现装箱决策的端到端优化,为复杂装箱场景提供了一种新颖的解决方案. |
| 英文摘要 |
| With consideration of the complexity of the three-dimensional bin packing problem (3D-BPP) in the multi
bin semi-online scenarios, a multi-agent hierarchical reinforcement learning algorithm is proposed to improve packing
efficiency and space utilization. The proposed algorithm models the problem by using a multi-agent Markov decision
process (MAMDP), including three fully cooperative agents responsible for item selection, bin selection, and placement
planning, respectively. A distributional learning method is introduced to enhance the stability and convergence of the
algorithm. Experimental results demonstrate that the algorithm exhibits superior packing performance across various envi
ronmental configurations, significantly improving space utilization and the number of packed items. It also shows strong
generalization capabilities in multi-bin and multi-item selection scenarios. Compared to traditional heuristic algorithms, the
proposed method has clear advantages in dynamic decision-making and adaptive optimization, particularly demonstrating
robustness when handling items with unknown size distributions. The innovation lies in the first application of a multi-agent
hierarchical reinforcement learning framework to the 3D-BPP, achieving end-to-end optimization of packing decisions and
providing a novel solution for complex packing scenarios. |
|
|
|
|
|