| 引用本文: | 李岳珩,谢广明.集中训练分布执行下的多智能体强化学习综述[J].控制理论与应用,2025,42(11):2114~2124.[点击复制] |
| LI Yue-heng,XIE Guang-ming.Review of multi-agent reinforcement learning under centralized training with decentralized execution[J].Control Theory & Applications,2025,42(11):2114~2124.[点击复制] |
|
| 集中训练分布执行下的多智能体强化学习综述 |
| Review of multi-agent reinforcement learning under centralized training with decentralized execution |
| 摘要点击 451 全文点击 77 投稿时间:2025-01-06 修订日期:2025-04-15 |
| 查看全文 查看/发表评论 下载PDF阅读器 HTML |
| DOI编号 10.7641/CTA.2025.50009 |
| 2025,42(11):2114-2124 |
| 中文关键词 多智能体强化学习 集中式训练分布式执行 值函数分解 强化学习 |
| 英文关键词 multi-agent reinforcement learning centralized training with decentralized execution value factorization reinforcement learning |
| 基金项目 国家自然科学基金项目(U22A2062,U23B2037),北京市自然科学基金项目(L222084,3242003)资助. |
|
| 中文摘要 |
| 近年来,多智能体强化学习(MARL)因其在解决实际问题上的巨大潜力而逐渐成为研究的热点.在复杂的
多智能体系统中,集中式训练与分布式执行(CTDE)框架被广泛应用.CTDE通过集中训练的方式缓解了多智能体强
化学习中的非平稳性问题,但也因此带来了新的挑战,特别是在训练过程中需要处理所有智能体的信息,尤其是随
着智能体数量增加而呈指数级增长的联合动作空间.本文对CTDE框架下的多智能体强化学习算法及其发展进行
了选择性的回顾,重点探讨了两大类算法:值函数分解方法和策略梯度方法.本文总结了这些算法在处理联合动作
空间复杂性、非平稳性问题以及估计误差方面的表现,并提出了一些新的视角和思路,旨在为该领域的研究人员提
供更深入的理解和未来研究的指导. |
| 英文摘要 |
| In recent years, multi-agent reinforcement learning (MARL) has gained significant attention due to its poten
tial in solving complex real-world problems. The centralized training with decentralized execution (CTDE) framework has
been widely adopted in complex multi-agent systems. CTDE alleviates the non-stationarity problem in MARL by utilizing
centralized training, but it also introduces new challenges, particularly in handling the information of all agents during
training, especially the exponentially growing joint action space as the number of agents increases. This paper provides a
selective review of the algorithms and developments in cooperative multi-agent reinforcement learning within the CTDE
framework, focusing on two main approaches: Value function factorization methods and policy-based methods. This pa
per summarizes the performance of these algorithms in addressing the complexity of joint action spaces, non-stationarity,
and estimation errors. It also proposes new perspectives and ideas, aiming to provide researchers in the field with deeper
insights and guidance for future studies. |
|
|
|
|
|