非线性零和微分对策的事件触发自适应动态规划算法

崔黎黎; 张勇; 张欣

引用本文:	崔黎黎,张勇,张欣.非线性零和微分对策的事件触发自适应动态规划算法[J].控制理论与应用,2018,35(5):610~618.[点击复制]
	CUI Li-li,ZHANG Yong,ZHANG Xin.Event-triggered adaptive dynamic programming algorithm for the nonlinear zero-sum differential games[J].Control Theory & Applications,2018,35(5):610~618.[点击复制]

非线性零和微分对策的事件触发自适应动态规划算法

Event-triggered adaptive dynamic programming algorithm for the nonlinear zero-sum differential games

摘要点击 4268 全文点击 1755 投稿时间：2017-09-15 修订日期：2017-12-27

查看全文查看/发表评论下载PDF阅读器 HTML

DOI编号 10.7641/CTA.2017.70674

2018,35(5):610-618

中文关键词自适应动态规划非线性零和微分对策事件触发神经网络最优控制

英文关键词 adaptive dynamic programming nonlinear zero-sum differential games event-triggered optimal control

基金项目国家自然科学基金项目(61703289), 山东省自然科学基金项目(BX2015DX009), 辽宁省高等学校基本科研项目专项资金(LQN201720, LQN2017 02), 沈阳师范大学科技项目(L201510)资助.

作者	单位	E-mail
崔黎黎^*	沈阳师范大学	cuilili8396@163.com
张勇	沈阳师范大学
张欣	中国石油大学（华东）

中文摘要

针对一类非线性零和微分对策问题, 本文提出了一种事件触发自适应动态规划(event-triggered adaptive dynamic programming, ET--ADP)算法在线求解其鞍点. 首先, 提出一个新的自适应事件触发条件. 然后, 利用一个输入为采样数据的神经网络(评价网络)近似最优值函数, 并设计了新型的神经网络权值更新律使得值函数、控制策略及扰动策略仅在事件触发时刻同步更新. 进一步地, 利用Lyapunov稳定性理论证明了所提出的算法能够在线获得非线性零和微分对策的鞍点且不会引起Zeno行为. 所提出的ET--ADP算法仅在事件触发条件满足时才更新值函数、控制策略和扰动策略, 因而可有效减少计算量和降低网络负荷. 最后, 两个仿真例子验证了所提出的ET--ADP算法的有效性.

英文摘要

In this paper, an event-triggered adaptive dynamic programming algorithm (ET--ADP) is proposed to solve the saddle point of a class of nonlinear zero-sum differential games. Firstly, a new adaptive event-triggered condition is proposed. Then, a neural network (critic network) with the sampled state as its input is utilized to approximate the optimal value function. The new neural network weights updating law is designed to enable the value function, the control strategy and the disturbance strategy to be updated synchronously only at the event-triggered time. Further, the Lyapunov stability theory is used to prove that the proposed algorithm can obtain the saddle point of nonlinear zero-sum differential games online and avoid the occurrence of Zeno behavior. In the proposed ET--ADP algorithm, the value function, the control strategy and the disturbance strategy are updated only when the event-triggered condition is satisfied, as a result of which the computational burden is reduced and the network burden is eased effectively. Finally, two simulation examples validate the effectiveness of the proposed ET--ADP algorithm.