| 引用本文: | 李心怡,李衍杰.基于优先级重规划的通信多智能体路径规划[J].控制理论与应用,2026,43(4):765~773.[点击复制] |
| LI Xin-yi,LI Yan-jie.Priority-based replanning for multi-agent pathfinding with communication[J].Control Theory & Applications,2026,43(4):765~773.[点击复制] |
|
| 基于优先级重规划的通信多智能体路径规划 |
| Priority-based replanning for multi-agent pathfinding with communication |
| 摘要点击 128 全文点击 22 投稿时间:2024-02-25 修订日期:2026-01-09 |
| 查看全文 查看/发表评论 下载PDF阅读器 HTML |
| DOI编号 10.7641/CTA.2024.40112 |
| 2026,43(4):765-773 |
| 中文关键词 重规划 优先级 动作检测 内部奖励机制 路径规划 |
| 英文关键词 replanning priority action detecting intrinsically motivated pathfinding |
| 基金项目 深圳市基础研究计划项目(JCYJ20180507183837726, JCYJ20220818102415033, JSGG20201103093802006)资助. |
|
| 中文摘要 |
| 多智能体强化学习(MARL)因其出色的泛化能力和快速的计算速度, 已成为解决实时性要求高任务的有效
工具, 并在多智能体路径规划(MAPF)问题中得到了广泛的研究和应用. 尽管如此, 当智能体密度极高时, 基于强化
学习的MAPF求解器仍面临为智能体规划无碰撞路径的难题. 现有的多数学习型MAPF求解器在预测到潜在碰撞
时, 倾向于将智能体的移动暂停作为替代动作, 而在智能体之间缺乏有效的协同机制, 这可能引发死锁. 为应对这
一挑战, 本研究提出了一种结合了强化学习和带注意力机制的通信网络的方法, 旨在优化MAPF问题的求解. 此外,
为降低高密度智能体环境中死锁的发生率, 本研究设计了一套动作检测与重规划策略. 研究中还引入了一种内部奖
励机制, 旨在激励智能体探索环境并加速其达到目标位置的过程. 经过实验验证, 所提出的方法在准确率方面显著
优于现有先进的学习型MAPF方法, 并在多个环境中的表现与接近最优解的求解器相媲美. |
| 英文摘要 |
| Multi-agent reinforcement learning (MARL), with its outstanding generalization capabilities and rapid computation speed, has become an effective tool for solving tasks with high real-time requirements and has been extensively
researched and applied in multi-agent path finding (MAPF) problems. Nevertheless, MARL-based MAPF solvers still face
the challenge of planning collision-free paths for agents when the agent density is extremely high. Most existing learningbased MAPF solvers tend to pause the movement of an agent as an alternative action when a potential collision is predicted,
lacking effective coordination mechanisms among agents, which may lead to deadlocks. To address this challenge, this
study proposes a method that combines reinforcement learning with a communication network equipped with an attention
mechanism, aimed at optimizing the solution of the MAPF problem. Moreover, to reduce the occurrence of deadlocks in
high-density agent environments, this study has designed a set of action detection and replanning strategies. An internal
reward mechanism has also been introduced to encourage agents to explore the environment and accelerate the process of
reaching their target locations. Experimental validation shows that the proposed method significantly outperforms existing
advanced learning-based MAPF methods in terms of accuracy and performs comparably to near-optimal solvers in multiple
environments. |
|
|
|
|
|