集中训练分布执行下的多智能体强化学习综述

李岳珩; 谢广明

引用本文:	李岳珩,谢广明.集中训练分布执行下的多智能体强化学习综述[J].控制理论与应用,2025,42(11):2114~2124.[点击复制]
	LI Yue-heng,XIE Guang-ming.Review of multi-agent reinforcement learning under centralized training with decentralized execution[J].Control Theory & Applications,2025,42(11):2114~2124.[点击复制]

集中训练分布执行下的多智能体强化学习综述

Review of multi-agent reinforcement learning under centralized training with decentralized execution

摘要点击 2367 全文点击 207 投稿时间：2025-01-06 修订日期：2025-04-15

查看全文查看/发表评论下载PDF阅读器 HTML

DOI编号 10.7641/CTA.2025.50009

2025,42(11):2114-2124

中文关键词多智能体强化学习集中式训练分布式执行值函数分解强化学习

英文关键词 multi-agent reinforcement learning centralized training with decentralized execution value factorization reinforcement learning

基金项目国家自然科学基金项目(U22A2062,U23B2037),北京市自然科学基金项目(L222084,3242003)资助.

作者	单位	E-mail
李岳珩	北京大学先进制造与机器人学院	liyueheng@pku.edu.cn
谢广明^*	北京大学人工智能研究院	xiegming@pku.edu.cn

中文摘要

近年来,多智能体强化学习(MARL)因其在解决实际问题上的巨大潜力而逐渐成为研究的热点.在复杂的多智能体系统中,集中式训练与分布式执行(CTDE)框架被广泛应用.CTDE通过集中训练的方式缓解了多智能体强化学习中的非平稳性问题,但也因此带来了新的挑战,特别是在训练过程中需要处理所有智能体的信息,尤其是随着智能体数量增加而呈指数级增长的联合动作空间.本文对CTDE框架下的多智能体强化学习算法及其发展进行了选择性的回顾,重点探讨了两大类算法:值函数分解方法和策略梯度方法.本文总结了这些算法在处理联合动作空间复杂性、非平稳性问题以及估计误差方面的表现,并提出了一些新的视角和思路,旨在为该领域的研究人员提供更深入的理解和未来研究的指导.

英文摘要

In recent years, multi-agent reinforcement learning (MARL) has gained significant attention due to its poten tial in solving complex real-world problems. The centralized training with decentralized execution (CTDE) framework has been widely adopted in complex multi-agent systems. CTDE alleviates the non-stationarity problem in MARL by utilizing centralized training, but it also introduces new challenges, particularly in handling the information of all agents during training, especially the exponentially growing joint action space as the number of agents increases. This paper provides a selective review of the algorithms and developments in cooperative multi-agent reinforcement learning within the CTDE framework, focusing on two main approaches: Value function factorization methods and policy-based methods. This pa per summarizes the performance of these algorithms in addressing the complexity of joint action spaces, non-stationarity, and estimation errors. It also proposes new perspectives and ideas, aiming to provide researchers in the field with deeper insights and guidance for future studies.