引用本文:李心怡,李衍杰.基于优先级重规划的通信多智能体路径规划[J].控制理论与应用,2026,43(4):765~773.[点击复制]
LI Xin-yi,LI Yan-jie.Priority-based replanning for multi-agent pathfinding with communication[J].Control Theory & Applications,2026,43(4):765~773.[点击复制]
基于优先级重规划的通信多智能体路径规划
Priority-based replanning for multi-agent pathfinding with communication
摘要点击 126  全文点击 22  投稿时间:2024-02-25  修订日期:2026-01-09
查看全文  查看/发表评论  下载PDF阅读器   HTML
DOI编号  10.7641/CTA.2024.40112
  2026,43(4):765-773
中文关键词  重规划  优先级  动作检测  内部奖励机制  路径规划
英文关键词  replanning  priority  action detecting  intrinsically motivated  pathfinding
基金项目  深圳市基础研究计划项目(JCYJ20180507183837726, JCYJ20220818102415033, JSGG20201103093802006)资助.
作者单位E-mail
李心怡 哈尔滨工业大学(深圳) 机电工程与自动化学院 广东省智能变形机构与自适应机器人重点实验室 22S153132@stu.hit.edu.cn 
李衍杰* 哈尔滨工业大学(深圳) 机电工程与自动化学院 广东省智能变形机构与自适应机器人重点实验室 autolyj@hit.edu.cn 
中文摘要
      多智能体强化学习(MARL)因其出色的泛化能力和快速的计算速度, 已成为解决实时性要求高任务的有效 工具, 并在多智能体路径规划(MAPF)问题中得到了广泛的研究和应用. 尽管如此, 当智能体密度极高时, 基于强化 学习的MAPF求解器仍面临为智能体规划无碰撞路径的难题. 现有的多数学习型MAPF求解器在预测到潜在碰撞 时, 倾向于将智能体的移动暂停作为替代动作, 而在智能体之间缺乏有效的协同机制, 这可能引发死锁. 为应对这 一挑战, 本研究提出了一种结合了强化学习和带注意力机制的通信网络的方法, 旨在优化MAPF问题的求解. 此外, 为降低高密度智能体环境中死锁的发生率, 本研究设计了一套动作检测与重规划策略. 研究中还引入了一种内部奖 励机制, 旨在激励智能体探索环境并加速其达到目标位置的过程. 经过实验验证, 所提出的方法在准确率方面显著 优于现有先进的学习型MAPF方法, 并在多个环境中的表现与接近最优解的求解器相媲美.
英文摘要
      Multi-agent reinforcement learning (MARL), with its outstanding generalization capabilities and rapid computation speed, has become an effective tool for solving tasks with high real-time requirements and has been extensively researched and applied in multi-agent path finding (MAPF) problems. Nevertheless, MARL-based MAPF solvers still face the challenge of planning collision-free paths for agents when the agent density is extremely high. Most existing learningbased MAPF solvers tend to pause the movement of an agent as an alternative action when a potential collision is predicted, lacking effective coordination mechanisms among agents, which may lead to deadlocks. To address this challenge, this study proposes a method that combines reinforcement learning with a communication network equipped with an attention mechanism, aimed at optimizing the solution of the MAPF problem. Moreover, to reduce the occurrence of deadlocks in high-density agent environments, this study has designed a set of action detection and replanning strategies. An internal reward mechanism has also been introduced to encourage agents to explore the environment and accelerate the process of reaching their target locations. Experimental validation shows that the proposed method significantly outperforms existing advanced learning-based MAPF methods in terms of accuracy and performs comparably to near-optimal solvers in multiple environments.