引用本文:邹启杰,刘世慧,张跃,侯英鹂.基于强化学习的快速探索随机树特殊环境中路径重规划算法[J].控制理论与应用,2020,37(8):1737~1748.[点击复制]
ZOU Qi-jie,LIU Shi-hui,ZHANG Yue,HOU Ying-li.Rapidly-exploring random tree algorithm for path re-planning based on reinforcement learning under the peculiar environment[J].Control Theory and Technology,2020,37(8):1737~1748.[点击复制]
基于强化学习的快速探索随机树特殊环境中路径重规划算法
Rapidly-exploring random tree algorithm for path re-planning based on reinforcement learning under the peculiar environment
摘要点击 2694  全文点击 819  投稿时间:2019-07-26  修订日期:2020-03-28
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/CTA.2020.90622
  2020,37(8):1737-1748
中文关键词  快速探索随机树  Sarsa(λ)  局部路径重规划  移动机器人  特殊环境
英文关键词  rapidly-exploring random tree (RRT)  Sarsa(λ)  local path re-planning  mobile robots  peculiar environment
基金项目  国家自然科学基金面上项目(61673084), 辽宁省自然基金项目(2019–ZD–0578)资助.
作者单位E-mail
邹启杰* 大连大学 jessie_zou_zou@163.com 
刘世慧 大连大学 1473245950@qq.com 
张跃 大连大学  
侯英鹂 大连大学  
中文摘要
      针对移动机器人在未知的特殊环境(如U型、狭窄且不规则通道等)下路径规划效率低问题, 本文提出一种 强化学习(RL)驱动快速探索随机树(RRT)的局部路径重规划方法(RL–RRT). 该方法利用Sarsa()优化RRT的随机树 扩展过程, 既保持未知环境中RRT的随机探索性, 又利用Sarsa()缩减无效区域的探索代价. 具体来说, 在满足移动 机器人运动学模型约束的同时, 通过设定扩展节点的回报函数、目标距离函数和平滑度目标函数, 缩减无效节点, 加速探索过程, 从而达到路径规划多目标决策优化的目标. 仿真实验中, 将本方法用于多种未知的特殊环境, 实验结 果显示出RL–RRT算法的可行性、有效性及其性能优势.
英文摘要
      In this paper, a local path re-planning rapidly-exploring random tree (RRT) method (RL–RRT) driven by reinforcement learning (RL) is proposed, aiming at the low efficiency of path planning for the mobile robot in the unknown and peculiar environments such as U-shaped, narrow and irregular channels. The RRT random tree expansion process is optimized by Sarsa() in this method, which not only maintains the random exploratory nature of RRT in the unknown environment, but also uses Sarsa() to reduce the exploration cost of the invalid region. Specifically, RL–RRT can reduce invalid nodes and accelerate the exploration process by setting the return function, target distance function and smoothness objective function of extended nodes, while satisfying the constraints of mobile robot kinematics model, so as to achieve the goal of multi-objective decision-making optimization of path planning. In the simulation experiment, RL–RRT is applied to many unknown and particular environments. The experimental results show the feasibility, effectiveness and performance advantages of RL–RRT method.