引用本文:任坚,刘剑慰,杨蒲.基于增量式策略强化学习算法的飞行控制系统的容错跟踪控制[J].控制理论与应用,2020,37(7):1429~1438.[点击复制]
REN Jian,LIU Jian-wei,YANG Pu.Fault-tolerant tracking control for continuous flight control system based on reinforcement learning algorithm with incremental strategy[J].Control Theory and Technology,2020,37(7):1429~1438.[点击复制]
基于增量式策略强化学习算法的飞行控制系统的容错跟踪控制
Fault-tolerant tracking control for continuous flight control system based on reinforcement learning algorithm with incremental strategy
摘要点击 2580  全文点击 879  投稿时间:2019-05-25  修订日期:2019-12-26
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/CTA.2020.90380
  2020,37(7):1429-1438
中文关键词  飞行控制系统  故障诊断  故障容错  强化学习  Q-learning算法  增量式策略  状态转移预测网络
英文关键词  flight control systems  fault diagnosis  fault tolerance  reinforcement learning  Q-learning algorithm  incremental strategy  state transition prediction
基金项目  民航飞机健康监测与智能维护重点实验室基金项目(NJ2018012), 先进飞行器导航、控制与健康管理工业和信息化部重点实验室(南京航空航天大 学)项目, 中央高校基本科研业务费项目(NS2017017), 国家自然科学基金项目(61533008, 61490703)资助.
作者单位E-mail
任坚 南京航空航天大学 398366373@qq.com 
刘剑慰* 南京航空航天大学 ljw301@nuaa.edu.cn 
杨蒲 南京航空航天大学  
中文摘要
      针对发生故障的飞行控制系统, 在强化学习算法的基础上, 提出了一种基于增量式策略的强化学习容错方 法. 本方法利用传感器获取的系统状态值, 根据系统预先设定的奖励函数对当前控制系统状况做出最优的决策并 不断更新价值网络, 将系统的容错控制过程转换为强化学习Agent的贯序决策过程, 并使用一种改进型的增量式策 略实现对当前故障的正确补偿策略的逐渐逼近. 同时, 针对连续控制系统, 提出一种状态转移预测网络来得到下一 步状态值. 最后, 通过南京航空航天大学“先进飞行器导航、控制与健康管理”工信部重点实验室的飞行器故障诊 断实验平台验证了该方法的有效性.
英文摘要
      A reinforcement learning method based on incremental strategy is proposed to make fault-tolerant tracking control for continuous flight control system with faults. The system state value obtained by the sensor is used in the method proposed by this paper, The fault-tolerant system makes optimal decisions on the current control system conditions based on pre-set reward functions and continuously updates the value network, This transforms the fault-tolerant control process of the system into a sequential decision-making process of the reinforcement learning agent, and gradually approximates the specific fault value using an improved incremental strategy. what’s more, A state transition prediction network is proposed for the continuous control system to obtain the next state value. Finally, The effectiveness of the proposed method is verified by the aircraft fault diagnosis experimental platform of the Key Laboratory of Advanced Aircraft Navigation, Control and Health Management of Nanjing University of Aeronautics and Astronautics.