基于强化学习的一类不确定非线性系统的最优输出调节

齐佳鑫; 孟桂芝

引用本文:	齐佳鑫,孟桂芝.基于强化学习的一类不确定非线性系统的最优输出调节[J].控制理论与应用,2025,42(9):1807~1817.[点击复制]
	QI Jia-xin,MENG Gui-zhi.Optimal output regulation for a class of uncertain nonlinear systems based on reinforcement learning[J].Control Theory & Applications,2025,42(9):1807~1817.[点击复制]

基于强化学习的一类不确定非线性系统的最优输出调节

Optimal output regulation for a class of uncertain nonlinear systems based on reinforcement learning

摘要点击 2590 全文点击 176 投稿时间：2023-12-14 修订日期：2025-02-11

查看全文查看/发表评论下载PDF阅读器 HTML

DOI编号 10.7641/CTA.2024.30806

2025,42(9):1807-1817

中文关键词输出调节最优控制强化学习反步法

英文关键词 output regulation optimal control reinforcement learning backstepping

基金项目黑龙江省“百千万”工程科技重大专项项目(2020ZX10A03),黑龙江省重点研发计划指导类项目(GZ20220106),国家外国专家项目(G2022012008) 资助.

作者	单位	E-mail
齐佳鑫	哈尔滨理工大学理学院	qjxin999@163.com
孟桂芝^*	哈尔滨理工大学理学院	menggz13@163.com

中文摘要

本文针对一类由线性中性稳定的外系统驱动的具有未知非线性函数和外界扰动的非线性系统的最优输出调节问题,提出了一种基于强化学习中的评价–执行网络算法的自适应最优控制策略.首先,根据调节器方程是可解的条件和坐标变换,将不确定非线性系统的输出调节问题转化为镇定问题,利用径向基神经网络去逼近未知非线性函数,设计具有内模的神经网络自适应观测器去估计不可测的状态;然后,设计了基于强化学习的自适应内模,提出了与内模相关的代价函数,并且在反步法的每一步中都运用基于评价–执行网络的近似最优算法,保证了所有的虚拟控制器均为最优,同时结合动态面技术避免了反步法中的“复杂度爆炸”问题;最后,通过所设计的最优自适应输出反馈控制器,不仅使得提出的价值函数达到最优,而且还确保了闭环系统的信号半全局最终一致有界且跟踪误差在期望的任意精度内.数值仿真验证了所提出方法的有效性.

英文摘要

For the optimal output regulation problem of a class of nonlinear systems with unknown nonlinear functions and external disturbances driven by a linear neutral stabilized external system, an adaptive optimal control strategy based on the evaluation executive network algorithm in reinforcement learning and the backstepping method is proposed. First, according to the condition that the regulator equations are solvable and the coordinate transformation, the output regulation problem of uncertain nonlinear systems is transformed into a stabilization problem. A neural network adaptive observer is designed to estimate the unmeasured state by using a radial basis function neural network to approach an unknown nonlinear function. Then an adaptive internal model based on reinforcement learning is designed, and the cost function associated with the internal model is presented, and an approximate optimal algorithm based on the evaluation executive network is used in every step of the backstepping method, all virtual controllers are guaranteed to be optimal. At the same time, the complexity explosion problem in backstepping is avoided by incorporating the dynamic surface technique. Finally, the proposed value function is not only optimized by the optimal adaptive output feedback controller, but also the signal semi globally of the closed loop system is eventually uniformly bounded and the tracking error is within the desired arbitrary accuracy. Numerical simulations verify the effectiveness of the proposed method.