在加强型学习系统中用伪熵进行不确定性估计

张  平; $2·卡纽

引用本文:	张平，斯特凡·卡纽.在加强型学习系统中用伪熵进行不确定性估计[J].控制理论与应用,1998,15(1):100~104.[点击复制]
	ZHANG Ping and Stéphane Canu.Uncertainty Estimate with Pseudo-Entropy in Reinforcement Learning[J].Control Theory & Applications,1998,15(1):100~104.[点击复制]

在加强型学习系统中用伪熵进行不确定性估计

Uncertainty Estimate with Pseudo-Entropy in Reinforcement Learning

摘要点击 1546 全文点击 618 投稿时间：1996-02-26 修订日期：1996-10-30

查看全文查看/发表评论下载PDF阅读器 HTML

DOI编号

1998,15(1):100-104

中文关键词加强型学习 Q-学习熵估计不确定性马尔柯夫过程

英文关键词 reinforcement learning Q-learning entropy estmate uncertainty Markov deci-sion

基金项目

作者	单位
张平，斯特凡·卡纽

中文摘要

加强型学习系统是一种与没有约束的、未知的环境相互作用的系统. 学习系统的目标在于最大可能地获取累积奖励信号.这个奖励信号在有限、未知的生命周期内由系统所处的环境中得到.对于一个加强型学习系统，困难之一在于奖励信号非常稀疏，尤其是对于只有时延信号的系统. 已有的加强型学习方法以价值函数的形式贮存奖励信号，例如著名的Q-学习，本文提出了一个基于状态的不确定性估计模型的方法. 这个算法有效地利用了存贮于价值函数中的奖励信息.它同时适用于带有立即奖励和时延奖励信号两种情况.实验结果表明，本文的算法具有很好的学习行为.

英文摘要

A reinforcement learning （RL） system interacts with an unrestricted, unknown en-vironment. Its goal is to maximize cumulative rewards, to be obtained throughout its limited, un-known litetime. One of difficulties for a RL system is that reward signal is sparse, specially for RL system with very delayed rewards. In this paper, we describe an algorithm based on a model of the state's uncertainty estimate. It uses efficiently reward information stored in value function. The experiments show that the algorithm has a very good performance.