智能车辆深度强化学习的模型迁移轨迹规划方法

余伶俐; 邵玄雅; 龙子威; 魏亚东; 周开军

引用本文:	余伶俐,邵玄雅,龙子威,魏亚东,周开军.智能车辆深度强化学习的模型迁移轨迹规划方法[J].控制理论与应用,2019,36(9):1409~1422.[点击复制]
	Yu Lingli,Shao Xuanya,Long Ziwei,WEI Ya-dong,Zhou Kaijun.Intelligent land vehicle model transfer trajectory planning method of deep reinforcement learning[J].Control Theory & Applications,2019,36(9):1409~1422.[点击复制]

智能车辆深度强化学习的模型迁移轨迹规划方法

Intelligent land vehicle model transfer trajectory planning method of deep reinforcement learning

摘要点击 5020 全文点击 1176 投稿时间：2018-05-09 修订日期：2018-11-16

查看全文查看/发表评论下载PDF阅读器 HTML

DOI编号 10.7641/CTA.2018.80341

2019,36(9):1409-1422

中文关键词路径规划智能车辆强化学习深度学习车辆模型

英文关键词 path planning, intelligent land vehicle reinforcement learning deep learning, vehicle kinematics model.

基金项目国家重点研发计划(2018YFB1201602)、湖南省科技重大专项(2017GK1010)、湖南省自然科学基金(2018JJ2531，2018JJ2197)、国家自然科学基金(61403426) 、国家重点实验室开放基金重点项目(SKLRS-2017-KF-13,SKLMT-KFKT-201602)资助

作者	单位	E-mail
余伶俐^*	中南大学	llyu@csu.edu.cn
邵玄雅	中南大学
龙子威	中南大学
魏亚东	中南大学
周开军	湖南商学院

中文摘要

针对智能驾驶车辆传统路径规划中出现车辆模型跟踪误差和过度依赖问题，提出一种基于深度强化学习的模型迁移的智能驾驶车辆轨迹规划方法。首先，提取真实环境的抽象模型，该模型利用深度确定性策略梯度(DDPG)和车辆动力学模型，共同训练逼近最优智能驾驶的强化学习模型；其次，通过模型迁移策略将实际场景问题迁移至虚拟抽象模型中，根据该环境中训练好的深度强化学习模型计算控制与轨迹序列；而后，根据真实环境中评价函数选择最优轨迹序列。实验结果表明，所提方法能够处理连续输入状态，并生成连续控制的转角控制序列，减少横向跟踪误差；同时通过模型迁移能够提高模型的泛化性能，减小过度依赖问题。

英文摘要

Aiming at the problem of unmanned vehicles model automobiles tracking error and excessive dependence in the traditional motion planning, a method of unmanned vehicle path planning based on deep reinforcement learning model migration is proposed. First, an abstract model of the real environment is extracted. The model uses the Deterministic Strategy Gradient (DDPG) and the Vehicle Dynamics Model to jointly train the enhanced learning model that approximates the optimal intelligent driving. Secondly, the actual scenario problem is migrated through the model migration strategy. In the virtual abstract model, the control and trajectory sequences are calculated according to the trained deep reinforcement learning model in the environment; then, the optimal trajectory sequence is selected according to the evaluation function in the real environment. The experimental results show that the proposed method can process the continuous input state and generate a continuously controlled corner control sequence to reduce the lateral tracking error. At the same time, the model can improve the generalization performance of the model and reduce the excessive dependence.