引用本文:李岳珩,谢广明.集中训练分布执行下的多智能体强化学习综述[J].控制理论与应用,2025,42(11):2114~2124.[点击复制]
LI Yue-heng,XIE Guang-ming.Review of multi-agent reinforcement learning under centralized training with decentralized execution[J].Control Theory & Applications,2025,42(11):2114~2124.[点击复制]
集中训练分布执行下的多智能体强化学习综述
Review of multi-agent reinforcement learning under centralized training with decentralized execution
摘要点击 451  全文点击 77  投稿时间:2025-01-06  修订日期:2025-04-15
查看全文  查看/发表评论  下载PDF阅读器   HTML
DOI编号  10.7641/CTA.2025.50009
  2025,42(11):2114-2124
中文关键词  多智能体强化学习  集中式训练分布式执行  值函数分解  强化学习
英文关键词  multi-agent reinforcement learning  centralized training with decentralized execution  value factorization  reinforcement learning
基金项目  国家自然科学基金项目(U22A2062,U23B2037),北京市自然科学基金项目(L222084,3242003)资助.
作者单位E-mail
李岳珩 北京大学先进制造与机器人学院 liyueheng@pku.edu.cn 
谢广明* 北京大学人工智能研究院 xiegming@pku.edu.cn 
中文摘要
      近年来,多智能体强化学习(MARL)因其在解决实际问题上的巨大潜力而逐渐成为研究的热点.在复杂的 多智能体系统中,集中式训练与分布式执行(CTDE)框架被广泛应用.CTDE通过集中训练的方式缓解了多智能体强 化学习中的非平稳性问题,但也因此带来了新的挑战,特别是在训练过程中需要处理所有智能体的信息,尤其是随 着智能体数量增加而呈指数级增长的联合动作空间.本文对CTDE框架下的多智能体强化学习算法及其发展进行 了选择性的回顾,重点探讨了两大类算法:值函数分解方法和策略梯度方法.本文总结了这些算法在处理联合动作 空间复杂性、非平稳性问题以及估计误差方面的表现,并提出了一些新的视角和思路,旨在为该领域的研究人员提 供更深入的理解和未来研究的指导.
英文摘要
      In recent years, multi-agent reinforcement learning (MARL) has gained significant attention due to its poten tial in solving complex real-world problems. The centralized training with decentralized execution (CTDE) framework has been widely adopted in complex multi-agent systems. CTDE alleviates the non-stationarity problem in MARL by utilizing centralized training, but it also introduces new challenges, particularly in handling the information of all agents during training, especially the exponentially growing joint action space as the number of agents increases. This paper provides a selective review of the algorithms and developments in cooperative multi-agent reinforcement learning within the CTDE framework, focusing on two main approaches: Value function factorization methods and policy-based methods. This pa per summarizes the performance of these algorithms in addressing the complexity of joint action spaces, non-stationarity, and estimation errors. It also proposes new perspectives and ideas, aiming to provide researchers in the field with deeper insights and guidance for future studies.