条件变分自编码器生成潜在空间特征的模仿学习算法

左国玉; 何流远; 吴启飞; 于双悦; 李建更

引用本文:	左国玉,何流远,吴启飞,于双悦,李建更.条件变分自编码器生成潜在空间特征的模仿学习算法[J].控制理论与应用,2026,43(5):989~1000.[点击复制]
	ZUO Guo-yu,HE Liu-yuan,WU Qi-fei,YU Shuang-yue,LI Jian-geng.Imitation learning with latent space features generated by conditional variational autoencoders[J].Control Theory & Applications,2026,43(5):989~1000.[点击复制]

条件变分自编码器生成潜在空间特征的模仿学习算法

Imitation learning with latent space features generated by conditional variational autoencoders

摘要点击 315 全文点击 12 投稿时间：2024-03-06 修订日期：2025-11-11

查看全文查看/发表评论下载PDF阅读器 HTML

DOI编号 10.7641/CTA.2025.40139

2026,43(5):989-1000

中文关键词机器人学习离线模仿学习潜在空间行为克隆条件变分自编码器

英文关键词 robot learning offline imitation learning latent space behavioral cloning conditional variational autoencoders

基金项目国家自然科学基金项目(62373016), 多模态人工智能国家重点实验室开放项目(MAIS–2023–22)资助.

作者	单位	E-mail
左国玉^*	北京工业大学信息科学技术学院	zuoguoyu@bjut.edu.cn
何流远	北京工业大学信息科学技术学院
吴启飞	北京工业大学信息科学技术学院
于双悦	北京工业大学信息科学技术学院
李建更	北京工业大学信息科学技术学院

中文摘要

高维环境下的任务是复杂任务中常见的一种. 该类任务的特点就是任务环境的信息数据和机器人的控制数据中包含很多种类, 具有很高的特征维度. 现有的模仿学习方法因专家示教数据分布复杂, 难以快速学习到较好的策略. 本文针对由高维环境空间和复杂数据分布导致模仿学习算法训练时间过长和应用受限的问题, 设计了一种条件变分自编码器生成潜在空间特征的模仿学习算法. 通过主动降低环境空间维度, 减少神经网络复杂程度以加快训练速度; 利用动作损失预测网络和扰动层, 从输出中获得反馈以提升训练准确率. 本文通过D4RL基准测试、微软MoCapAct人形机器人的连续控制任务和人形五指手机器人复杂操作任务的仿真测试, 以验证所提算法的有效性, 结果表明, 本文所提方法表现出训练速度更快、准确率更高以及策略更稳定.

英文摘要

The tasks in the high-dimensional environment are common in complex tasks. The characteristics of this type of task are the information data of the task environment and the control data of the robot containing many types, with high characteristic dimensions. The existing imitation learning method makes it difficult to learn better strategies because of the complicated distribution of teaching data by experts. In this paper, an imitation learning algorithm for generating latent space features using conditional variational autoencoders is designed to address the problem of long training time and limited application of imitation learning algorithms due to high-dimensional environment space and complex data distribution. By actively reducing the dimensionality of the environment space, the complexity of the neural network is reduced to speed up the training speed, and by utilizing the action loss prediction network and the perturbation layer, feedback is obtained from the output to improve the training accuracy. This paper verifies the effectiveness of the proposed algorithm through D4RL benchmark testing, simulated tests of continuous control tasks for the Microsoft MoCapAct humanoid robot, and complex manipulation tasks for the humanoid five-fingered robot. The results show that the method proposed in this paper has faster training speed, higher accuracy and more stable strategy.