引用本文:王臆淞,赵铭慧,张雪波.ASM2: 面向海空联合场景的多对手多智能体博弈算法[J].控制理论与应用,2025,42(7):1275~1284.[点击复制]
WANG Yi-song,ZHAO Ming-hui,ZHANG Xue-bo.ASM2: Multi-agent multi-opponent game algorithm for joint sea-air scenarios[J].Control Theory & Applications,2025,42(7):1275~1284.[点击复制]
ASM2: 面向海空联合场景的多对手多智能体博弈算法
ASM2: Multi-agent multi-opponent game algorithm for joint sea-air scenarios
摘要点击 4114  全文点击 373  投稿时间:2023-04-17  修订日期:2024-12-20
查看全文  查看/发表评论  下载PDF阅读器   HTML
DOI编号  10.7641/CTA.2024.30220
  2025,42(7):1275-1284
中文关键词  无人系统智能对抗  兵棋推演  海空联合作战  智能控制  多智能体强化学习
英文关键词  intelligent confrontation of unmanned systems  wargame  air-sea joint operations  intelligent control  multi-agent reinforcement learning
基金项目  国家自然科学基金项目(62293510, 62293513), 天津市杰出青年科学基金项目(19JCJQJC62100), 中央高校基本科研业务费项目资助.
作者单位E-mail
王臆淞 南开大学 人工智能学院机器人与信息自动化研究所 2120220504@mail.nankai.edu.cn 
赵铭慧 南开大学 人工智能学院机器人与信息自动化研究所  
张雪波 南开大学 人工智能学院机器人与信息自动化研究所  
中文摘要
      在复杂的海空联合智能博弈环境下, 博弈环境态势信息高维且动态变化, 对实现异构作战单元协同决策提 出了巨大挑战, 且当前现有算法多数存在维数爆炸, 泛化性差的问题. 因此, 如何通过有限的博弈环境态势信息, 实 现异构作战单元协同决策, 是亟待解决的难题. 为此, 本文提出了能全面有效地表征博弈环境态势信息、指挥控制 异构作战单元、引导算法训练方向的海空联合智能博弈问题的形式化建模方式. 其次, 提出了ASM2海空联合博弈 算法, 该算法以MAPPO分布式多智能体博弈算法为基础, 设计了嵌入Elo评分系统的多对手多智能体训练框架, 提 升了模型的泛化能力. 最后, 在兵棋推演仿真平台进行了验证测试, 结果表明所提算法训练后的模型能有效应对多 种不同专家对手策略, 具有较好的可行性和泛化能力, 能够推动未来复杂无人装备作战对抗能力的提升.
英文摘要
      In the intricate air-sea joint intelligent game environment, the situational information of the game environment is high-dimensional and undergoes dynamic changes. This presents a significant challenge for achieving collaborative decision-making among heterogeneous combat units. Moreover, many of the existing algorithms grapple with issues of dimensionality explosion and suboptimal generalization. Addressing the challenge of facilitating collaborative decision-making through limited situational information becomes imperative. To tackle this, this paper introduces a formalized modeling approach for the air-sea joint intelligent game. This approach can holistically and effectively characterize the situational information, command and control heterogeneous combat units, and steer the algorithm training direction. Furthermore, we propose the ASM2 (air-sea multi-opponent multi-agent proximal policy optimization) algorithm for air-sea joint gaming. Rooted in the multi-agent proximal policy optimization (MAPPO) distributed multi-agent gaming algorithm, ASM2 incorporates a multi-opponent multi-agent training framework embedded with the Elo scoring system, enhancing the model’s generalization capabilities. Validation tests on a wargame simulation platform indicate that the model, once trained with our proposed algorithm, can adeptly handle various expert opponent strategies. It showcases commendable feasibility and generalization prowess, paving the way for bolstering the combat capabilities of future complex unmanned equipment.