引用本文:左国玉,何流远,吴启飞,于双悦,李建更.条件变分自编码器生成潜在空间特征的模仿学习算法[J].控制理论与应用,2026,43(5):989~1000.[点击复制]
ZUO Guo-yu,HE Liu-yuan,WU Qi-fei,YU Shuang-yue,LI Jian-geng.Imitation learning with latent space features generated by conditional variational autoencoders[J].Control Theory & Applications,2026,43(5):989~1000.[点击复制]
条件变分自编码器生成潜在空间特征的模仿学习算法
Imitation learning with latent space features generated by conditional variational autoencoders
摘要点击 315  全文点击 12  投稿时间:2024-03-06  修订日期:2025-11-11
查看全文  查看/发表评论  下载PDF阅读器   HTML
DOI编号  10.7641/CTA.2025.40139
  2026,43(5):989-1000
中文关键词  机器人学习  离线模仿学习  潜在空间  行为克隆  条件变分自编码器
英文关键词  robot learning  offline imitation learning  latent space  behavioral cloning  conditional variational autoencoders
基金项目  国家自然科学基金项目(62373016), 多模态人工智能国家重点实验室开放项目(MAIS–2023–22)资助.
作者单位E-mail
左国玉* 北京工业大学信息科学技术学院 zuoguoyu@bjut.edu.cn 
何流远 北京工业大学信息科学技术学院  
吴启飞 北京工业大学信息科学技术学院  
于双悦 北京工业大学信息科学技术学院  
李建更 北京工业大学信息科学技术学院  
中文摘要
      高维环境下的任务是复杂任务中常见的一种. 该类任务的特点就是任务环境的信息数据和机器人的控制 数据中包含很多种类, 具有很高的特征维度. 现有的模仿学习方法因专家示教数据分布复杂, 难以快速学习到较好 的策略. 本文针对由高维环境空间和复杂数据分布导致模仿学习算法训练时间过长和应用受限的问题, 设计了一种 条件变分自编码器生成潜在空间特征的模仿学习算法. 通过主动降低环境空间维度, 减少神经网络复杂程度以加 快训练速度; 利用动作损失预测网络和扰动层, 从输出中获得反馈以提升训练准确率. 本文通过D4RL基准测试、微 软MoCapAct人形机器人的连续控制任务和人形五指手机器人复杂操作任务的仿真测试, 以验证所提算法的有效 性, 结果表明, 本文所提方法表现出训练速度更快、准确率更高以及策略更稳定.
英文摘要
      The tasks in the high-dimensional environment are common in complex tasks. The characteristics of this type of task are the information data of the task environment and the control data of the robot containing many types, with high characteristic dimensions. The existing imitation learning method makes it difficult to learn better strategies because of the complicated distribution of teaching data by experts. In this paper, an imitation learning algorithm for generating latent space features using conditional variational autoencoders is designed to address the problem of long training time and limited application of imitation learning algorithms due to high-dimensional environment space and complex data distribution. By actively reducing the dimensionality of the environment space, the complexity of the neural network is reduced to speed up the training speed, and by utilizing the action loss prediction network and the perturbation layer, feedback is obtained from the output to improve the training accuracy. This paper verifies the effectiveness of the proposed algorithm through D4RL benchmark testing, simulated tests of continuous control tasks for the Microsoft MoCapAct humanoid robot, and complex manipulation tasks for the humanoid five-fingered robot. The results show that the method proposed in this paper has faster training speed, higher accuracy and more stable strategy.