移动机器人路径规划强化学习的初始化

宋勇; 李贻斌; 李彩虹

引用本文:	宋勇,李贻斌,李彩虹.移动机器人路径规划强化学习的初始化[J].控制理论与应用,2012,29(12):1623~1628.[点击复制]
	SONG Yong,LI Yi-bin,LI Cai-hong.Initialization in reinforcement learning for mobile robots path planning[J].Control Theory and Technology,2012,29(12):1623~1628.[点击复制]

移动机器人路径规划强化学习的初始化

Initialization in reinforcement learning for mobile robots path planning

摘要点击 4110 全文点击 3042 投稿时间：2011-10-17 修订日期：2012-07-21

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/j.issn.1000-8152.2012.12.CCTA111169

2012,29(12):1623-1628

中文关键词移动机器人强化学习人工势能场路径规划 Q值初始化

英文关键词 mobile robots reinforcement learning artificial potential field path planning Q values initialization

基金项目国家自然科学基金资助项目(61075091, 61174054); 国家自然科学基金青年基金资助项目(61105100).

作者	单位	E-mail
宋勇	山东大学控制科学与工程学院山东大学(威海) 机电与信息工程学院	songyong@sdu.edu.cn
李贻斌^*	山东大学控制科学与工程学院	liyb@sdu.edu.cn
李彩虹	山东理工大学计算机科学与技术学院

中文摘要

针对现有机器人路径规划强化学习算法收敛速度慢的问题, 提出了一种基于人工势能场的移动机器人强化学习初始化方法. 将机器人工作环境虚拟化为一个人工势能场, 利用先验知识确定场中每点的势能值, 它代表最优策略可获得的最大累积回报. 例如障碍物区域势能值为零, 目标点的势能值为全局最大. 然后定义Q初始值为当前点的立即回报加上后继点的最大折算累积回报. 改进算法通过Q值初始化, 使得学习过程收敛速度更快, 收敛过程更稳定. 最后利用机器人在栅格地图中的路径对所提出的改进算法进行验证, 结果表明该方法提高了初始阶段的学习效率, 改善了算法性能.

英文摘要

To improve the convergence rate of the standard Q-learning algorithm, we propose an initialization method for the reinforcement learning of the mobile robot, based on the artificial potential field (APF) -a virtue field of the robot workspace. The potential energy of each point in the field is specified based on prior knowledge, which represents the maximum cumulative reward by following the optimal path policy. In APF, points corresponding to obstacles have null potential energy; the objective point has the global maximum potential energy in the workspace. The initial Q value is defined as the immediate reward at the current point plus the maximum cumulative reward at succeeding points by following the optimal path policy. By initializing the Q value, we find that the improved algorithm converges more rapidly and steadily than the original algorithm. The proposed algorithm is validated by the robot path in the grid workspace. Results of experiments show that the improved algorithm promotes the learning efficiency in the early stage of learning, and improves the performance.