锅炉汽轮机系统经验导向单评价Q-learning负荷控制

刘晓敏; 余梦君; 王浩宇; 杨春雨; 周林娜; 周怀春

引用本文:	刘晓敏,余梦君,王浩宇,杨春雨,周林娜,周怀春.锅炉汽轮机系统经验导向单评价Q-learning负荷控制[J].控制理论与应用,2026,43(5):1034~1042.[点击复制]
	LIU Xiao-min,YU Meng-jun,WANG Hao-yu,YANG Chun-yu,ZHOU Lin-na,ZHOU Huai-chun.Experience-guided critic-only Q-learning load control for boiler-turbine system[J].Control Theory & Applications,2026,43(5):1034~1042.[点击复制]

锅炉汽轮机系统经验导向单评价Q-learning负荷控制

Experience-guided critic-only Q-learning load control for boiler-turbine system

摘要点击 290 全文点击 14 投稿时间：2024-05-06 修订日期：2025-11-07

查看全文查看/发表评论下载PDF阅读器 HTML

DOI编号 10.7641/CTA.2025.40256

2026,43(5):1034-1042

中文关键词锅炉–汽轮机系统经验导向单评价网络 Q-learning 负荷跟踪

英文关键词 boiler-turbine system experience-guided critic-only network Q-learning load tracking

基金项目国家自然科学基金项目(62073327, 62273350, 62303468, 62303469), 江苏省自然科学基金项目(BK20221112, BK20221116), 中国博士后科学基金项目(2023M733757), 江苏省卓越博士后计划项目(2022ZB530), 山西省重点研究开发项目(202202100401002)资助.

作者	单位	E-mail
刘晓敏	中国矿业大学信息与控制工程学院	xiaominliu@cumt.edu.cn
余梦君	中国矿业大学信息与控制工程学院
王浩宇	中国矿业大学信息与控制工程学院
杨春雨^*	中国矿业大学信息与控制工程学院	chunyuyang@cumt.edu.cn
周林娜	中国矿业大学信息与控制工程学院
周怀春	中国矿业大学低碳能源与动力工程学院

中文摘要

为解决锅炉–汽轮机系统负荷控制面临的精准数学模型难以构建、阀门约束呈现非对称特性和运行经验数据抽取方法单一等挑战, 本文提出一种基于经验导向单评价Q-learning算法的锅炉–汽轮机系统自适应负荷跟踪控制方法. 引入约束转换函数, 将约束非对称输入映射至控制范围的中值, 处理非对称问题, 并将性能指标函数重塑为不含额外惩罚项的形式. 为降低在线计算负荷, 提出轻量型单评价网络Q-learning算法, 实现对改进后性能指标函数的快速学习. 利用前幕更新所得策略在多幕数据之间在线建立经验导向关系, 搭建多幕分段训练新模式, 实现数据高效挖掘, 加快算法收敛速度. 通过在160MW锅炉–汽轮机系统仿真, 验证所提出控制算法的有效性和优越性.

英文摘要

To address the challenges encountered in load control of boiler-turbine systems, such as the complexities in establishing precise mathematical models, asymmetric characteristics of valve constraints, and the limited methods for extracting operational experience data, this paper proposes an experience-guided critic-only Q-learning method for boilerturbine systems adaptive load tracking control. A constraint transformation function is introduced to map asymmetrically constrained inputs to the median of the control range, effectively addressing the asymmetry issue, while reshaping the performance index function into a form without additional penalty terms. To reduce the online computational load, a lightweight critic-only network Q-learning algorithm is proposed to achieve fast learning of the improved performance index function. By updating strategies from previous episodes, an experience-guided relationship is established among multi-episode datasets online. Subsequently, a novel model for recurrent multi-episode training is formulated, aimed at optimizing data mining efficiency and expediting algorithmic convergence. The effectiveness and superiority of the proposed control algorithm are verified by simulation on the 160MW boiler-turbine system.