| 引用本文: | 钟皓俊,王振雷.基于双经验回放池TD3算法的PID参数优化[J].控制理论与应用,2026,43(1):139~148.[点击复制] |
| ZHONG Hao-jun,WANG Zhen-lei.PID parameter optimization based on TD3algorithm of double replay buffer[J].Control Theory & Applications,2026,43(1):139~148.[点击复制] |
|
| 基于双经验回放池TD3算法的PID参数优化 |
| PID parameter optimization based on TD3algorithm of double replay buffer |
| 摘要点击 195 全文点击 26 投稿时间:2023-11-09 修订日期:2025-07-18 |
| 查看全文 查看/发表评论 下载PDF阅读器 HTML |
| DOI编号 10.7641/CTA.2024.30730 |
| 2026,43(1):139-148 |
| 中文关键词 PID参数优化 深度强化学习 TD3 |
| 英文关键词 PID parameter optimization deep reinforcement learning TD3 |
| 基金项目 国家自然科学基金项目(62293501,62173147). |
|
| 中文摘要 |
| PID控制器在工业控制领域应用广泛,其参数的选择过度依赖于人工经验,效率低且过程繁琐.近年来,深
度强化学习因其具有对复杂环境自学习的能力,在很多领域取得了成功应用.本文提出一种基于双经验回放池双
延迟深度确定性策略梯度(TD3)算法的PID参数优化方法,利用深度强化学习的方法,自主优化PID控制器的参数.
在整个优化过程中,将控制问题视为序列决策过程,通过设计智能体的状态、动作空间以及网络结构,将PID参数的
优化过程转化为强化学习策略网络权重的更新过程.同时,针对TD3算法训练前期探索效率低的问题,在TD3算法
的基础上,增加双经验回放池机制,提升了算法训练前期的效率.最后,在二阶系统和一阶加纯时滞系统上进行仿真
验证,并与基于粒子群优化(PSO)算法优化PID参数的方法进行对比,实验结果表明,所提算法优化得到的PID参数
在控制器上体现的控制性能要优于PSO算法. |
| 英文摘要 |
| PID controller is widely used in the field of industrial control, the selection of its parameters is over-dependent
on manual experience, the efficiency is low and the process is complicated. In recent years, deep reinforcement learning
has been successfully applied in many fields because of its ability to self-learn from complex environments. In this paper,
a PID parameter optimization method based on twin delayed deep deterministic policy gradient (TD3) algorithm of double
replay buffer is proposed, and the parameters of PID controller are optimized by deep reinforcement learning. In the whole
optimization process, the control problem is regarded as a sequence decision process. The optimization process of PID
parameters is transformed into the updating process of the weights of the agent’s network by designing the state space,
action space and the network structure of the agent. At the same time, to solve the problem of low exploration efficiency
in the early stage of TD3 algorithm training, the double experience replay buffer mechanism is added on the basis of TD3
algorithm to improve the efficiency of the early stage of algorithm training. Finally, simulations are performed on the
second-order system and first order plus delay time system, and compared with the PID parameter optimization method
based on particle swarm optimization (PSO) algorithm. The experimental results show that the PID parameters optimized
by the proposed algorithm have better control performance than the PSO algorithm. |
|
|
|
|
|