| 引用本文: | 左国玉,何流远,吴启飞,于双悦,李建更.条件变分自编码器生成潜在空间特征的模仿学习算法[J].控制理论与应用,2026,43(5):989~1000.[点击复制] |
| ZUO Guo-yu,HE Liu-yuan,WU Qi-fei,YU Shuang-yue,LI Jian-geng.Imitation learning with latent space features generated by conditional variational autoencoders[J].Control Theory & Applications,2026,43(5):989~1000.[点击复制] |
|
| 条件变分自编码器生成潜在空间特征的模仿学习算法 |
| Imitation learning with latent space features generated by conditional variational autoencoders |
| 摘要点击 313 全文点击 12 投稿时间:2024-03-06 修订日期:2025-11-11 |
| 查看全文 查看/发表评论 下载PDF阅读器 HTML |
| DOI编号 10.7641/CTA.2025.40139 |
| 2026,43(5):989-1000 |
| 中文关键词 机器人学习 离线模仿学习 潜在空间 行为克隆 条件变分自编码器 |
| 英文关键词 robot learning offline imitation learning latent space behavioral cloning conditional variational autoencoders |
| 基金项目 国家自然科学基金项目(62373016), 多模态人工智能国家重点实验室开放项目(MAIS–2023–22)资助. |
|
| 中文摘要 |
| 高维环境下的任务是复杂任务中常见的一种. 该类任务的特点就是任务环境的信息数据和机器人的控制
数据中包含很多种类, 具有很高的特征维度. 现有的模仿学习方法因专家示教数据分布复杂, 难以快速学习到较好
的策略. 本文针对由高维环境空间和复杂数据分布导致模仿学习算法训练时间过长和应用受限的问题, 设计了一种
条件变分自编码器生成潜在空间特征的模仿学习算法. 通过主动降低环境空间维度, 减少神经网络复杂程度以加
快训练速度; 利用动作损失预测网络和扰动层, 从输出中获得反馈以提升训练准确率. 本文通过D4RL基准测试、微
软MoCapAct人形机器人的连续控制任务和人形五指手机器人复杂操作任务的仿真测试, 以验证所提算法的有效
性, 结果表明, 本文所提方法表现出训练速度更快、准确率更高以及策略更稳定. |
| 英文摘要 |
| The tasks in the high-dimensional environment are common in complex tasks. The characteristics of this
type of task are the information data of the task environment and the control data of the robot containing many types,
with high characteristic dimensions. The existing imitation learning method makes it difficult to learn better strategies
because of the complicated distribution of teaching data by experts. In this paper, an imitation learning algorithm for
generating latent space features using conditional variational autoencoders is designed to address the problem of long
training time and limited application of imitation learning algorithms due to high-dimensional environment space and
complex data distribution. By actively reducing the dimensionality of the environment space, the complexity of the neural
network is reduced to speed up the training speed, and by utilizing the action loss prediction network and the perturbation
layer, feedback is obtained from the output to improve the training accuracy. This paper verifies the effectiveness of
the proposed algorithm through D4RL benchmark testing, simulated tests of continuous control tasks for the Microsoft
MoCapAct humanoid robot, and complex manipulation tasks for the humanoid five-fingered robot. The results show that
the method proposed in this paper has faster training speed, higher accuracy and more stable strategy. |
|
|
|
|
|