引用本文:王晴,王雨珏,王浩然,辛斌.融合深度强化学习与图神经网络的动态武器目标分配优化[J].控制理论与应用,2025,42(11):2252~2260.[点击复制]
WANG Qing,WANG Yu-jue,WANG Hao-ran,XIN Bin.Dynamic weapon-target assignment optimization integrating deep reinforcement learning and graph neural networks[J].Control Theory & Applications,2025,42(11):2252~2260.[点击复制]
融合深度强化学习与图神经网络的动态武器目标分配优化
Dynamic weapon-target assignment optimization integrating deep reinforcement learning and graph neural networks
摘要点击 4021  全文点击 179  投稿时间:2025-02-20  修订日期:2025-09-10
查看全文  查看/发表评论  下载PDF阅读器   HTML
DOI编号  10.7641/CTA.2025.50065
  2025,42(11):2252-2260
中文关键词  动态传感器武器目标分配  深度强化学习  图神经网络  OODA环
英文关键词  dynamic sensor-weapon-target assignment  deep reinforcement learning  graph neural network  OODA loop
基金项目  北京市自然科学基金面上项目(4252050),国家自然科学基金杰出青年科学基金项目(62425304)资助.
作者单位E-mail
王晴 北京理工大学自动化学院 wangqing1020@bit.edu.cn 
王雨珏 北京理工大学自动化学院  
王浩然 北京理工大学自动化学院  
辛斌* 北京理工大学自动化学院 brucebin@bit.edu.cn 
中文摘要
      本文提出了一种基于深度强化学习(DRL)与图神经网络(GNN)的动态传感器武器目标分配(SWTA)方法, 旨在应对现代战场中复杂和动态的决策需求.传统静态方法在实时变化的战场中效率低、适应性差.为此,本文通 过结合深度强化学习与图神经网络,构建智能决策框架,利用环境交互和深度学习优化决策策略,提高资源分配效 率和决策精度.框架受OODA环理论指导,通过图神经网络捕捉场景中武器、目标和传感器的关系特征,快速生成分 配方案,结合深度强化学习优化策略,实现动态环境下的资源分配优化.优化过程中考虑到了作战效能,资源消耗 和关键要地保护的约束.实验表明,该方法在多种场景中表现优异,显著提升了资源利用率和作战效果.
英文摘要
      This paper proposes a dynamic sensor-weapon-target assignment (SWTA) method based on deep reinforce ment learning (DRL) and graph neural network (GNN), aimed at addressing the complex and dynamic decision-making requirements on modern battlefields. Traditional static methods are inefficient and lack adaptability in real-time chang ing battlefield environments. To tackle this issue, DRL is combined with GNN to build an intelligent decision-making framework. This framework leverages environmental interaction and deep learning to optimize decision-making strategies, thereby improving resource allocation efficiency and decision accuracy. Guided by the OODA loop theory, the framework uses GNN to capture the relationships between weapons, targets, and sensors in the battlefield, quickly generating assign ment solutions. The DRL component then optimizes these strategies, enabling resource allocation optimization in dynamic environments. The optimization process takes into account operational effectiveness, resource consumption, and the pro tection of key locations. Experiments demonstrate that this method performs excellently in various scenarios, significantly enhancing resource utilization and operational outcomes.