| 引用本文: | 齐佳鑫,孟桂芝.基于强化学习的一类不确定非线性系统的最优输出调节[J].控制理论与应用,2025,42(9):1807~1817.[点击复制] |
| QI Jia-xin,MENG Gui-zhi.Optimal output regulation for a class of uncertain nonlinear systems based on reinforcement learning[J].Control Theory & Applications,2025,42(9):1807~1817.[点击复制] |
|
| 基于强化学习的一类不确定非线性系统的最优输出调节 |
| Optimal output regulation for a class of uncertain nonlinear systems based on reinforcement learning |
| 摘要点击 2590 全文点击 176 投稿时间:2023-12-14 修订日期:2025-02-11 |
| 查看全文 查看/发表评论 下载PDF阅读器 HTML |
| DOI编号 10.7641/CTA.2024.30806 |
| 2025,42(9):1807-1817 |
| 中文关键词 输出调节 最优控制 强化学习 反步法 |
| 英文关键词 output regulation optimal control reinforcement learning backstepping |
| 基金项目 黑龙江省“百千万”工程科技重大专项项目(2020ZX10A03),黑龙江省重点研发计划指导类项目(GZ20220106),国家外国专家项目(G2022012008) 资助. |
|
| 中文摘要 |
| 本文针对一类由线性中性稳定的外系统驱动的具有未知非线性函数和外界扰动的非线性系统的最优输出
调节问题,提出了一种基于强化学习中的评价–执行网络算法的自适应最优控制策略.首先,根据调节器方程是可解
的条件和坐标变换,将不确定非线性系统的输出调节问题转化为镇定问题,利用径向基神经网络去逼近未知非线性
函数,设计具有内模的神经网络自适应观测器去估计不可测的状态;然后,设计了基于强化学习的自适应内模,提出
了与内模相关的代价函数,并且在反步法的每一步中都运用基于评价–执行网络的近似最优算法,保证了所有的虚
拟控制器均为最优,同时结合动态面技术避免了反步法中的“复杂度爆炸”问题;最后,通过所设计的最优自适应输
出反馈控制器,不仅使得提出的价值函数达到最优,而且还确保了闭环系统的信号半全局最终一致有界且跟踪误差
在期望的任意精度内.数值仿真验证了所提出方法的有效性. |
| 英文摘要 |
| For the optimal output regulation problem of a class of nonlinear systems with unknown nonlinear functions
and external disturbances driven by a linear neutral stabilized external system, an adaptive optimal control strategy based
on the evaluation executive network algorithm in reinforcement learning and the backstepping method is proposed. First,
according to the condition that the regulator equations are solvable and the coordinate transformation, the output regulation
problem of uncertain nonlinear systems is transformed into a stabilization problem. A neural network adaptive observer is
designed to estimate the unmeasured state by using a radial basis function neural network to approach an unknown nonlinear
function. Then an adaptive internal model based on reinforcement learning is designed, and the cost function associated
with the internal model is presented, and an approximate optimal algorithm based on the evaluation executive network is
used in every step of the backstepping method, all virtual controllers are guaranteed to be optimal. At the same time, the
complexity explosion problem in backstepping is avoided by incorporating the dynamic surface technique. Finally, the
proposed value function is not only optimized by the optimal adaptive output feedback controller, but also the signal semi
globally of the closed loop system is eventually uniformly bounded and the tracking error is within the desired arbitrary
accuracy. Numerical simulations verify the effectiveness of the proposed method. |
|
|
|
|
|