引用本文:钟皓俊,王振雷.基于双经验回放池TD3算法的PID参数优化[J].控制理论与应用,2026,43(1):139~148.[点击复制]
ZHONG Hao-jun,WANG Zhen-lei.PID parameter optimization based on TD3algorithm of double replay buffer[J].Control Theory & Applications,2026,43(1):139~148.[点击复制]
基于双经验回放池TD3算法的PID参数优化
PID parameter optimization based on TD3algorithm of double replay buffer
摘要点击 197  全文点击 26  投稿时间:2023-11-09  修订日期:2025-07-18
查看全文  查看/发表评论  下载PDF阅读器   HTML
DOI编号  10.7641/CTA.2024.30730
  2026,43(1):139-148
中文关键词  PID参数优化  深度强化学习  TD3
英文关键词  PID parameter optimization  deep reinforcement learning  TD3
基金项目  国家自然科学基金项目(62293501,62173147).
作者单位E-mail
钟皓俊 华东理工大学能源化工过程智能制造教育部重点实验室 zhong_haojun2022@163.com 
王振雷* 华东理工大学能源化工过程智能制造教育部重点实验室 wangzhen_l@ecust.edu.cn 
中文摘要
      PID控制器在工业控制领域应用广泛,其参数的选择过度依赖于人工经验,效率低且过程繁琐.近年来,深 度强化学习因其具有对复杂环境自学习的能力,在很多领域取得了成功应用.本文提出一种基于双经验回放池双 延迟深度确定性策略梯度(TD3)算法的PID参数优化方法,利用深度强化学习的方法,自主优化PID控制器的参数. 在整个优化过程中,将控制问题视为序列决策过程,通过设计智能体的状态、动作空间以及网络结构,将PID参数的 优化过程转化为强化学习策略网络权重的更新过程.同时,针对TD3算法训练前期探索效率低的问题,在TD3算法 的基础上,增加双经验回放池机制,提升了算法训练前期的效率.最后,在二阶系统和一阶加纯时滞系统上进行仿真 验证,并与基于粒子群优化(PSO)算法优化PID参数的方法进行对比,实验结果表明,所提算法优化得到的PID参数 在控制器上体现的控制性能要优于PSO算法.
英文摘要
      PID controller is widely used in the field of industrial control, the selection of its parameters is over-dependent on manual experience, the efficiency is low and the process is complicated. In recent years, deep reinforcement learning has been successfully applied in many fields because of its ability to self-learn from complex environments. In this paper, a PID parameter optimization method based on twin delayed deep deterministic policy gradient (TD3) algorithm of double replay buffer is proposed, and the parameters of PID controller are optimized by deep reinforcement learning. In the whole optimization process, the control problem is regarded as a sequence decision process. The optimization process of PID parameters is transformed into the updating process of the weights of the agent’s network by designing the state space, action space and the network structure of the agent. At the same time, to solve the problem of low exploration efficiency in the early stage of TD3 algorithm training, the double experience replay buffer mechanism is added on the basis of TD3 algorithm to improve the efficiency of the early stage of algorithm training. Finally, simulations are performed on the second-order system and first order plus delay time system, and compared with the PID parameter optimization method based on particle swarm optimization (PSO) algorithm. The experimental results show that the PID parameters optimized by the proposed algorithm have better control performance than the PSO algorithm.