基于增量式策略强化学习算法的飞行控制系统的容错跟踪控制

任坚; 刘剑慰; 杨蒲

引用本文:	任坚,刘剑慰,杨蒲.基于增量式策略强化学习算法的飞行控制系统的容错跟踪控制[J].控制理论与应用,2020,37(7):1429~1438.[点击复制]
	REN Jian,LIU Jian-wei,YANG Pu.Fault-tolerant tracking control for continuous flight control system based on reinforcement learning algorithm with incremental strategy[J].Control Theory & Applications,2020,37(7):1429~1438.[点击复制]

基于增量式策略强化学习算法的飞行控制系统的容错跟踪控制

Fault-tolerant tracking control for continuous flight control system based on reinforcement learning algorithm with incremental strategy

摘要点击 4231 全文点击 1275 投稿时间：2019-05-25 修订日期：2019-12-26

查看全文查看/发表评论下载PDF阅读器 HTML

DOI编号 10.7641/CTA.2020.90380

2020,37(7):1429-1438

中文关键词飞行控制系统故障诊断故障容错强化学习 Q-learning算法增量式策略状态转移预测网络

英文关键词 flight control systems fault diagnosis fault tolerance reinforcement learning Q-learning algorithm incremental strategy state transition prediction

基金项目民航飞机健康监测与智能维护重点实验室基金项目(NJ2018012), 先进飞行器导航、控制与健康管理工业和信息化部重点实验室(南京航空航天大学)项目, 中央高校基本科研业务费项目(NS2017017), 国家自然科学基金项目(61533008, 61490703)资助.

作者	单位	E-mail
任坚	南京航空航天大学	398366373@qq.com
刘剑慰^*	南京航空航天大学	ljw301@nuaa.edu.cn
杨蒲	南京航空航天大学

中文摘要

针对发生故障的飞行控制系统, 在强化学习算法的基础上, 提出了一种基于增量式策略的强化学习容错方法. 本方法利用传感器获取的系统状态值, 根据系统预先设定的奖励函数对当前控制系统状况做出最优的决策并不断更新价值网络, 将系统的容错控制过程转换为强化学习Agent的贯序决策过程, 并使用一种改进型的增量式策略实现对当前故障的正确补偿策略的逐渐逼近. 同时, 针对连续控制系统, 提出一种状态转移预测网络来得到下一步状态值. 最后, 通过南京航空航天大学“先进飞行器导航、控制与健康管理”工信部重点实验室的飞行器故障诊断实验平台验证了该方法的有效性.

英文摘要

A reinforcement learning method based on incremental strategy is proposed to make fault-tolerant tracking control for continuous flight control system with faults. The system state value obtained by the sensor is used in the method proposed by this paper, The fault-tolerant system makes optimal decisions on the current control system conditions based on pre-set reward functions and continuously updates the value network, This transforms the fault-tolerant control process of the system into a sequential decision-making process of the reinforcement learning agent, and gradually approximates the specific fault value using an improved incremental strategy. what’s more, A state transition prediction network is proposed for the continuous control system to obtain the next state value. Finally, The effectiveness of the proposed method is verified by the aircraft fault diagnosis experimental platform of the Key Laboratory of Advanced Aircraft Navigation, Control and Health Management of Nanjing University of Aeronautics and Astronautics.