| 引用本文: | 刘晓敏,余梦君,王浩宇,杨春雨,周林娜,周怀春.锅炉汽轮机系统经验导向单评价Q-learning负荷控制[J].控制理论与应用,2026,43(5):1034~1042.[点击复制] |
| LIU Xiao-min,YU Meng-jun,WANG Hao-yu,YANG Chun-yu,ZHOU Lin-na,ZHOU Huai-chun.Experience-guided critic-only Q-learning load control for boiler-turbine system[J].Control Theory & Applications,2026,43(5):1034~1042.[点击复制] |
|
| 锅炉汽轮机系统经验导向单评价Q-learning负荷控制 |
| Experience-guided critic-only Q-learning load control for boiler-turbine system |
| 摘要点击 290 全文点击 14 投稿时间:2024-05-06 修订日期:2025-11-07 |
| 查看全文 查看/发表评论 下载PDF阅读器 HTML |
| DOI编号 10.7641/CTA.2025.40256 |
| 2026,43(5):1034-1042 |
| 中文关键词 锅炉–汽轮机系统 经验导向 单评价网络 Q-learning 负荷跟踪 |
| 英文关键词 boiler-turbine system experience-guided critic-only network Q-learning load tracking |
| 基金项目 国家自然科学基金项目(62073327, 62273350, 62303468, 62303469), 江苏省自然科学基金项目(BK20221112, BK20221116), 中国博士后科学基 金项目(2023M733757), 江苏省卓越博士后计划项目(2022ZB530), 山西省重点研究开发项目(202202100401002)资助. |
|
| 中文摘要 |
| 为解决锅炉–汽轮机系统负荷控制面临的精准数学模型难以构建、阀门约束呈现非对称特性和运行经验数
据抽取方法单一等挑战, 本文提出一种基于经验导向单评价Q-learning算法的锅炉–汽轮机系统自适应负荷跟踪控
制方法. 引入约束转换函数, 将约束非对称输入映射至控制范围的中值, 处理非对称问题, 并将性能指标函数重塑
为不含额外惩罚项的形式. 为降低在线计算负荷, 提出轻量型单评价网络Q-learning算法, 实现对改进后性能指标函
数的快速学习. 利用前幕更新所得策略在多幕数据之间在线建立经验导向关系, 搭建多幕分段训练新模式, 实现数
据高效挖掘, 加快算法收敛速度. 通过在160MW锅炉–汽轮机系统仿真, 验证所提出控制算法的有效性和优越性. |
| 英文摘要 |
| To address the challenges encountered in load control of boiler-turbine systems, such as the complexities
in establishing precise mathematical models, asymmetric characteristics of valve constraints, and the limited methods for
extracting operational experience data, this paper proposes an experience-guided critic-only Q-learning method for boilerturbine
systems adaptive load tracking control. A constraint transformation function is introduced to map asymmetrically
constrained inputs to the median of the control range, effectively addressing the asymmetry issue, while reshaping the
performance index function into a form without additional penalty terms. To reduce the online computational load, a
lightweight critic-only network Q-learning algorithm is proposed to achieve fast learning of the improved performance
index function. By updating strategies from previous episodes, an experience-guided relationship is established among
multi-episode datasets online. Subsequently, a novel model for recurrent multi-episode training is formulated, aimed at optimizing
data mining efficiency and expediting algorithmic convergence. The effectiveness and superiority of the proposed
control algorithm are verified by simulation on the 160MW boiler-turbine system. |
|
|
|
|
|