融合深度强化学习与图神经网络的动态武器目标分配优化

王晴; 王雨珏; 王浩然; 辛斌

引用本文:	王晴,王雨珏,王浩然,辛斌.融合深度强化学习与图神经网络的动态武器目标分配优化[J].控制理论与应用,2025,42(11):2252~2260.[点击复制]
	WANG Qing,WANG Yu-jue,WANG Hao-ran,XIN Bin.Dynamic weapon-target assignment optimization integrating deep reinforcement learning and graph neural networks[J].Control Theory & Applications,2025,42(11):2252~2260.[点击复制]

融合深度强化学习与图神经网络的动态武器目标分配优化

Dynamic weapon-target assignment optimization integrating deep reinforcement learning and graph neural networks

摘要点击 4021 全文点击 179 投稿时间：2025-02-20 修订日期：2025-09-10

查看全文查看/发表评论下载PDF阅读器 HTML

DOI编号 10.7641/CTA.2025.50065

2025,42(11):2252-2260

中文关键词动态传感器武器目标分配深度强化学习图神经网络 OODA环

英文关键词 dynamic sensor-weapon-target assignment deep reinforcement learning graph neural network OODA loop

基金项目北京市自然科学基金面上项目(4252050),国家自然科学基金杰出青年科学基金项目(62425304)资助.

作者	单位	E-mail
王晴	北京理工大学自动化学院	wangqing1020@bit.edu.cn
王雨珏	北京理工大学自动化学院
王浩然	北京理工大学自动化学院
辛斌^*	北京理工大学自动化学院	brucebin@bit.edu.cn

中文摘要

本文提出了一种基于深度强化学习(DRL)与图神经网络(GNN)的动态传感器武器目标分配(SWTA)方法, 旨在应对现代战场中复杂和动态的决策需求.传统静态方法在实时变化的战场中效率低、适应性差.为此,本文通过结合深度强化学习与图神经网络,构建智能决策框架,利用环境交互和深度学习优化决策策略,提高资源分配效率和决策精度.框架受OODA环理论指导,通过图神经网络捕捉场景中武器、目标和传感器的关系特征,快速生成分配方案,结合深度强化学习优化策略,实现动态环境下的资源分配优化.优化过程中考虑到了作战效能,资源消耗和关键要地保护的约束.实验表明,该方法在多种场景中表现优异,显著提升了资源利用率和作战效果.

英文摘要

This paper proposes a dynamic sensor-weapon-target assignment (SWTA) method based on deep reinforce ment learning (DRL) and graph neural network (GNN), aimed at addressing the complex and dynamic decision-making requirements on modern battlefields. Traditional static methods are inefficient and lack adaptability in real-time chang ing battlefield environments. To tackle this issue, DRL is combined with GNN to build an intelligent decision-making framework. This framework leverages environmental interaction and deep learning to optimize decision-making strategies, thereby improving resource allocation efficiency and decision accuracy. Guided by the OODA loop theory, the framework uses GNN to capture the relationships between weapons, targets, and sensors in the battlefield, quickly generating assign ment solutions. The DRL component then optimizes these strategies, enabling resource allocation optimization in dynamic environments. The optimization process takes into account operational effectiveness, resource consumption, and the pro tection of key locations. Experiments demonstrate that this method performs excellently in various scenarios, significantly enhancing resource utilization and operational outcomes.