基于双经验回放池TD3算法的PID参数优化

钟皓俊; 王振雷

引用本文:	钟皓俊,王振雷.基于双经验回放池TD3算法的PID参数优化[J].控制理论与应用,2026,43(1):139~148.[点击复制]
	ZHONG Hao-jun,WANG Zhen-lei.PID parameter optimization based on TD3algorithm of double replay buffer[J].Control Theory & Applications,2026,43(1):139~148.[点击复制]

基于双经验回放池TD3算法的PID参数优化

PID parameter optimization based on TD3algorithm of double replay buffer

摘要点击 195 全文点击 26 投稿时间：2023-11-09 修订日期：2025-07-18

查看全文查看/发表评论下载PDF阅读器 HTML

DOI编号 10.7641/CTA.2024.30730

2026,43(1):139-148

中文关键词 PID参数优化深度强化学习 TD3

英文关键词 PID parameter optimization deep reinforcement learning TD3

基金项目国家自然科学基金项目(62293501,62173147).

作者	单位	E-mail
钟皓俊	华东理工大学能源化工过程智能制造教育部重点实验室	zhong_haojun2022@163.com
王振雷^*	华东理工大学能源化工过程智能制造教育部重点实验室	wangzhen_l@ecust.edu.cn

中文摘要

PID控制器在工业控制领域应用广泛,其参数的选择过度依赖于人工经验,效率低且过程繁琐.近年来,深度强化学习因其具有对复杂环境自学习的能力,在很多领域取得了成功应用.本文提出一种基于双经验回放池双延迟深度确定性策略梯度(TD3)算法的PID参数优化方法,利用深度强化学习的方法,自主优化PID控制器的参数. 在整个优化过程中,将控制问题视为序列决策过程,通过设计智能体的状态、动作空间以及网络结构,将PID参数的优化过程转化为强化学习策略网络权重的更新过程.同时,针对TD3算法训练前期探索效率低的问题,在TD3算法的基础上,增加双经验回放池机制,提升了算法训练前期的效率.最后,在二阶系统和一阶加纯时滞系统上进行仿真验证,并与基于粒子群优化(PSO)算法优化PID参数的方法进行对比,实验结果表明,所提算法优化得到的PID参数在控制器上体现的控制性能要优于PSO算法.

英文摘要

PID controller is widely used in the field of industrial control, the selection of its parameters is over-dependent on manual experience, the efficiency is low and the process is complicated. In recent years, deep reinforcement learning has been successfully applied in many fields because of its ability to self-learn from complex environments. In this paper, a PID parameter optimization method based on twin delayed deep deterministic policy gradient (TD3) algorithm of double replay buffer is proposed, and the parameters of PID controller are optimized by deep reinforcement learning. In the whole optimization process, the control problem is regarded as a sequence decision process. The optimization process of PID parameters is transformed into the updating process of the weights of the agent’s network by designing the state space, action space and the network structure of the agent. At the same time, to solve the problem of low exploration efficiency in the early stage of TD3 algorithm training, the double experience replay buffer mechanism is added on the basis of TD3 algorithm to improve the efficiency of the early stage of algorithm training. Finally, simulations are performed on the second-order system and first order plus delay time system, and compared with the PID parameter optimization method based on particle swarm optimization (PSO) algorithm. The experimental results show that the PID parameters optimized by the proposed algorithm have better control performance than the PSO algorithm.