ASM2: 面向海空联合场景的多对手多智能体博弈算法

王臆淞; 赵铭慧; 张雪波

引用本文:	王臆淞,赵铭慧,张雪波.ASM2: 面向海空联合场景的多对手多智能体博弈算法[J].控制理论与应用,2025,42(7):1275~1284.[点击复制]
	WANG Yi-song,ZHAO Ming-hui,ZHANG Xue-bo.ASM2: Multi-agent multi-opponent game algorithm for joint sea-air scenarios[J].Control Theory & Applications,2025,42(7):1275~1284.[点击复制]

ASM2: 面向海空联合场景的多对手多智能体博弈算法

ASM2: Multi-agent multi-opponent game algorithm for joint sea-air scenarios

摘要点击 4114 全文点击 373 投稿时间：2023-04-17 修订日期：2024-12-20

查看全文查看/发表评论下载PDF阅读器 HTML

DOI编号 10.7641/CTA.2024.30220

2025,42(7):1275-1284

中文关键词无人系统智能对抗兵棋推演海空联合作战智能控制多智能体强化学习

英文关键词 intelligent confrontation of unmanned systems wargame air-sea joint operations intelligent control multi-agent reinforcement learning

基金项目国家自然科学基金项目(62293510, 62293513), 天津市杰出青年科学基金项目(19JCJQJC62100), 中央高校基本科研业务费项目资助.

作者	单位	E-mail
王臆淞	南开大学人工智能学院机器人与信息自动化研究所	2120220504@mail.nankai.edu.cn
赵铭慧	南开大学人工智能学院机器人与信息自动化研究所
张雪波	南开大学人工智能学院机器人与信息自动化研究所

中文摘要

在复杂的海空联合智能博弈环境下, 博弈环境态势信息高维且动态变化, 对实现异构作战单元协同决策提出了巨大挑战, 且当前现有算法多数存在维数爆炸, 泛化性差的问题. 因此, 如何通过有限的博弈环境态势信息, 实现异构作战单元协同决策, 是亟待解决的难题. 为此, 本文提出了能全面有效地表征博弈环境态势信息、指挥控制异构作战单元、引导算法训练方向的海空联合智能博弈问题的形式化建模方式. 其次, 提出了ASM2海空联合博弈算法, 该算法以MAPPO分布式多智能体博弈算法为基础, 设计了嵌入Elo评分系统的多对手多智能体训练框架, 提升了模型的泛化能力. 最后, 在兵棋推演仿真平台进行了验证测试, 结果表明所提算法训练后的模型能有效应对多种不同专家对手策略, 具有较好的可行性和泛化能力, 能够推动未来复杂无人装备作战对抗能力的提升.

英文摘要

In the intricate air-sea joint intelligent game environment, the situational information of the game environment is high-dimensional and undergoes dynamic changes. This presents a significant challenge for achieving collaborative decision-making among heterogeneous combat units. Moreover, many of the existing algorithms grapple with issues of dimensionality explosion and suboptimal generalization. Addressing the challenge of facilitating collaborative decision-making through limited situational information becomes imperative. To tackle this, this paper introduces a formalized modeling approach for the air-sea joint intelligent game. This approach can holistically and effectively characterize the situational information, command and control heterogeneous combat units, and steer the algorithm training direction. Furthermore, we propose the ASM2 (air-sea multi-opponent multi-agent proximal policy optimization) algorithm for air-sea joint gaming. Rooted in the multi-agent proximal policy optimization (MAPPO) distributed multi-agent gaming algorithm, ASM2 incorporates a multi-opponent multi-agent training framework embedded with the Elo scoring system, enhancing the model’s generalization capabilities. Validation tests on a wargame simulation platform indicate that the model, once trained with our proposed algorithm, can adeptly handle various expert opponent strategies. It showcases commendable feasibility and generalization prowess, paving the way for bolstering the combat capabilities of future complex unmanned equipment.