引用本文:齐佳鑫,孟桂芝.基于强化学习的一类不确定非线性系统的最优输出调节[J].控制理论与应用,2025,42(9):1807~1817.[点击复制]
QI Jia-xin,MENG Gui-zhi.Optimal output regulation for a class of uncertain nonlinear systems based on reinforcement learning[J].Control Theory & Applications,2025,42(9):1807~1817.[点击复制]
基于强化学习的一类不确定非线性系统的最优输出调节
Optimal output regulation for a class of uncertain nonlinear systems based on reinforcement learning
摘要点击 2590  全文点击 176  投稿时间:2023-12-14  修订日期:2025-02-11
查看全文  查看/发表评论  下载PDF阅读器   HTML
DOI编号  10.7641/CTA.2024.30806
  2025,42(9):1807-1817
中文关键词  输出调节  最优控制  强化学习  反步法
英文关键词  output regulation  optimal control  reinforcement learning  backstepping
基金项目  黑龙江省“百千万”工程科技重大专项项目(2020ZX10A03),黑龙江省重点研发计划指导类项目(GZ20220106),国家外国专家项目(G2022012008) 资助.
作者单位E-mail
齐佳鑫 哈尔滨理工大学理学院 qjxin999@163.com 
孟桂芝* 哈尔滨理工大学理学院 menggz13@163.com 
中文摘要
      本文针对一类由线性中性稳定的外系统驱动的具有未知非线性函数和外界扰动的非线性系统的最优输出 调节问题,提出了一种基于强化学习中的评价–执行网络算法的自适应最优控制策略.首先,根据调节器方程是可解 的条件和坐标变换,将不确定非线性系统的输出调节问题转化为镇定问题,利用径向基神经网络去逼近未知非线性 函数,设计具有内模的神经网络自适应观测器去估计不可测的状态;然后,设计了基于强化学习的自适应内模,提出 了与内模相关的代价函数,并且在反步法的每一步中都运用基于评价–执行网络的近似最优算法,保证了所有的虚 拟控制器均为最优,同时结合动态面技术避免了反步法中的“复杂度爆炸”问题;最后,通过所设计的最优自适应输 出反馈控制器,不仅使得提出的价值函数达到最优,而且还确保了闭环系统的信号半全局最终一致有界且跟踪误差 在期望的任意精度内.数值仿真验证了所提出方法的有效性.
英文摘要
      For the optimal output regulation problem of a class of nonlinear systems with unknown nonlinear functions and external disturbances driven by a linear neutral stabilized external system, an adaptive optimal control strategy based on the evaluation executive network algorithm in reinforcement learning and the backstepping method is proposed. First, according to the condition that the regulator equations are solvable and the coordinate transformation, the output regulation problem of uncertain nonlinear systems is transformed into a stabilization problem. A neural network adaptive observer is designed to estimate the unmeasured state by using a radial basis function neural network to approach an unknown nonlinear function. Then an adaptive internal model based on reinforcement learning is designed, and the cost function associated with the internal model is presented, and an approximate optimal algorithm based on the evaluation executive network is used in every step of the backstepping method, all virtual controllers are guaranteed to be optimal. At the same time, the complexity explosion problem in backstepping is avoided by incorporating the dynamic surface technique. Finally, the proposed value function is not only optimized by the optimal adaptive output feedback controller, but also the signal semi globally of the closed loop system is eventually uniformly bounded and the tracking error is within the desired arbitrary accuracy. Numerical simulations verify the effectiveness of the proposed method.