| 引用本文: | 王昱,任田君,范子琳,孟光磊.基于角度特征的分布式DDPG无人机追击决策[J].控制理论与应用,2025,42(7):1356~1366.[点击复制] |
| WANG Yu,REN Tian-jun,FAN Zi-lin,MENG Guang-lei.Distributed DDPG UAV pursuit decision based on angle feature[J].Control Theory & Applications,2025,42(7):1356~1366.[点击复制] |
|
| 基于角度特征的分布式DDPG无人机追击决策 |
| Distributed DDPG UAV pursuit decision based on angle feature |
| 摘要点击 3111 全文点击 218 投稿时间:2023-03-05 修订日期:2025-04-05 |
| 查看全文 查看/发表评论 下载PDF阅读器 HTML |
| DOI编号 10.7641/CTA.2024.30105 |
| 2025,42(7):1356-1366 |
| 中文关键词 追击决策 强化学习 分布式DDPG算法 角度特征 |
| 英文关键词 pursuit decision-making reinforcement learning distributed DDPG algorithm angle feature |
| 基金项目 国家自然科学基金项目(61906125, 62373261), 辽宁省属本科高校基本科研业务费专项基金项目(LJ232410143020, LJ212410143047)资助. |
|
| 中文摘要 |
| 无人机执行追击任务过程中态势变化迅速, 不灵活的网络更新机制和固化的奖励函数使得现有决策模型
难以持续输出正确且高效的策略. 针对此问题, 提出了一种基于角度特征的分布式深度确定性策略梯度(DDPG)算
法. 首先, 为避免梯度消失或爆炸以稳定模型训练过程, 提出先利用梯度上升计算目标值, 再使用MSE损失函数训
练的Actor网络更新机制; 然后, 依据双方角度特征划分策略引导区域, 通过设置不同的奖励函数权重, 构建基
于5个DDPG网络的分布式决策模型, 利用在不同态势下对奖励函数权重的动态选择和无缝切换提升算法的决策能
力. 仿真实验表明, 相比于DDPG和双延迟深度确定性策略梯度(TD3)算法, 所提算法无论追击直线逃逸目标或智能
逃逸目标, 均具有更高的成功率和决策效率. |
| 英文摘要 |
| The situation of the UAV changes rapidly during the pursuit mission. The inflexible network update mechanism and the fixed reward function make it difficult for the existing decision model to continuously output correct and
efficient strategies. To solve this problem, a distributed deep deterministic policy gradient (DDPG) algorithm based on
angle feature is proposed. Firstly, to avoid gradient disappearing or exploding, stabilize the training process of the model,
a parameter update mechanism of Actor network is proposed, which uses gradient ascent to calculate the target value of
Actor network, and then trains Actor network with the mean-square error (MSE) loss function. Then, the strategy guidance
area is divided according to the situation of both sides. By assigning different weights to the reward function, a distributed
decision-making model is built based on five DDPG networks. Using the dynamic selection and seamless switching of
reward function weights under different situations, the decision-making ability of the algorithm is improved. Simulation
results show that comparing with the algorithms of DDPG and twin delayed deep deterministic policy gradient (TD3), the
proposed algorithm has a higher success rate and higher decision-making efficiency when pursuing the linear escape target
or the intelligent escape target. |
|
|
|
|
|