引用本文:宋勇,李贻斌,李彩虹.移动机器人路径规划强化学习的初始化[J].控制理论与应用,2012,29(12):1623~1628.[点击复制]
SONG Yong,LI Yi-bin,LI Cai-hong.Initialization in reinforcement learning for mobile robots path planning[J].Control Theory and Technology,2012,29(12):1623~1628.[点击复制]
移动机器人路径规划强化学习的初始化
Initialization in reinforcement learning for mobile robots path planning
摘要点击 3861  全文点击 2983  投稿时间:2011-10-17  修订日期:2012-07-21
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/j.issn.1000-8152.2012.12.CCTA111169
  2012,29(12):1623-1628
中文关键词  移动机器人  强化学习  人工势能场  路径规划  Q值初始化
英文关键词  mobile robots  reinforcement learning  artificial potential field  path planning  Q values initialization
基金项目  国家自然科学基金资助项目(61075091, 61174054); 国家自然科学基金青年基金资助项目(61105100).
作者单位E-mail
宋勇 山东大学 控制科学与工程学院
山东大学(威海) 机电与信息工程学院 
songyong@sdu.edu.cn 
李贻斌* 山东大学 控制科学与工程学院 liyb@sdu.edu.cn 
李彩虹 山东理工大学 计算机科学与技术学院  
中文摘要
      针对现有机器人路径规划强化学习算法收敛速度慢的问题, 提出了一种基于人工势能场的移动机器人强化学习初始化方法. 将机器人工作环境虚拟化为一个人工势能场, 利用先验知识确定场中每点的势能值, 它代表最优策略可获得的最大累积回报. 例如障碍物区域势能值为零, 目标点的势能值为全局最大. 然后定义Q初始值为当前点的立即回报加上后继点的最大折算累积回报. 改进算法通过Q值初始化, 使得学习过程收敛速度更快, 收敛过程更稳定. 最后利用机器人在栅格地图中的路径对所提出的改进算法进行验证, 结果表明该方法提高了初始阶段的学习效率, 改善了算法性能.
英文摘要
      To improve the convergence rate of the standard Q-learning algorithm, we propose an initialization method for the reinforcement learning of the mobile robot, based on the artificial potential field (APF) -a virtue field of the robot workspace. The potential energy of each point in the field is specified based on prior knowledge, which represents the maximum cumulative reward by following the optimal path policy. In APF, points corresponding to obstacles have null potential energy; the objective point has the global maximum potential energy in the workspace. The initial Q value is defined as the immediate reward at the current point plus the maximum cumulative reward at succeeding points by following the optimal path policy. By initializing the Q value, we find that the improved algorithm converges more rapidly and steadily than the original algorithm. The proposed algorithm is validated by the robot path in the grid workspace. Results of experiments show that the improved algorithm promotes the learning efficiency in the early stage of learning, and improves the performance.