基于优先级重规划的通信多智能体路径规划

李心怡; 李衍杰

引用本文:	李心怡,李衍杰.基于优先级重规划的通信多智能体路径规划[J].控制理论与应用,2026,43(4):765~773.[点击复制]
	LI Xin-yi,LI Yan-jie.Priority-based replanning for multi-agent pathfinding with communication[J].Control Theory & Applications,2026,43(4):765~773.[点击复制]

基于优先级重规划的通信多智能体路径规划

Priority-based replanning for multi-agent pathfinding with communication

摘要点击 126 全文点击 22 投稿时间：2024-02-25 修订日期：2026-01-09

查看全文查看/发表评论下载PDF阅读器 HTML

DOI编号 10.7641/CTA.2024.40112

2026,43(4):765-773

中文关键词重规划优先级动作检测内部奖励机制路径规划

英文关键词 replanning priority action detecting intrinsically motivated pathfinding

基金项目深圳市基础研究计划项目(JCYJ20180507183837726, JCYJ20220818102415033, JSGG20201103093802006)资助.

作者	单位	E-mail
李心怡	哈尔滨工业大学(深圳) 机电工程与自动化学院广东省智能变形机构与自适应机器人重点实验室	22S153132@stu.hit.edu.cn
李衍杰^*	哈尔滨工业大学(深圳) 机电工程与自动化学院广东省智能变形机构与自适应机器人重点实验室	autolyj@hit.edu.cn

中文摘要

多智能体强化学习(MARL)因其出色的泛化能力和快速的计算速度, 已成为解决实时性要求高任务的有效工具, 并在多智能体路径规划(MAPF)问题中得到了广泛的研究和应用. 尽管如此, 当智能体密度极高时, 基于强化学习的MAPF求解器仍面临为智能体规划无碰撞路径的难题. 现有的多数学习型MAPF求解器在预测到潜在碰撞时, 倾向于将智能体的移动暂停作为替代动作, 而在智能体之间缺乏有效的协同机制, 这可能引发死锁. 为应对这一挑战, 本研究提出了一种结合了强化学习和带注意力机制的通信网络的方法, 旨在优化MAPF问题的求解. 此外, 为降低高密度智能体环境中死锁的发生率, 本研究设计了一套动作检测与重规划策略. 研究中还引入了一种内部奖励机制, 旨在激励智能体探索环境并加速其达到目标位置的过程. 经过实验验证, 所提出的方法在准确率方面显著优于现有先进的学习型MAPF方法, 并在多个环境中的表现与接近最优解的求解器相媲美.

英文摘要

Multi-agent reinforcement learning (MARL), with its outstanding generalization capabilities and rapid computation speed, has become an effective tool for solving tasks with high real-time requirements and has been extensively researched and applied in multi-agent path finding (MAPF) problems. Nevertheless, MARL-based MAPF solvers still face the challenge of planning collision-free paths for agents when the agent density is extremely high. Most existing learningbased MAPF solvers tend to pause the movement of an agent as an alternative action when a potential collision is predicted, lacking effective coordination mechanisms among agents, which may lead to deadlocks. To address this challenge, this study proposes a method that combines reinforcement learning with a communication network equipped with an attention mechanism, aimed at optimizing the solution of the MAPF problem. Moreover, to reduce the occurrence of deadlocks in high-density agent environments, this study has designed a set of action detection and replanning strategies. An internal reward mechanism has also been introduced to encourage agents to explore the environment and accelerate the process of reaching their target locations. Experimental validation shows that the proposed method significantly outperforms existing advanced learning-based MAPF methods in terms of accuracy and performs comparably to near-optimal solvers in multiple environments.