自适应视听信息融合用于抗噪语音识别

梁冰; 陈德运; 程慧

引用本文:	梁冰,陈德运,程慧.自适应视听信息融合用于抗噪语音识别[J].控制理论与应用,2011,28(10):1461~1466.[点击复制]
	LIANG Bing,CHEN De-yun,CHENG Hui.Adaptive fusion of acoustic and visual information in noise-robust speech recognition[J].Control Theory & Applications,2011,28(10):1461~1466.[点击复制]

自适应视听信息融合用于抗噪语音识别

Adaptive fusion of acoustic and visual information in noise-robust speech recognition

摘要点击 2990 全文点击 1757 投稿时间：2010-05-02 修订日期：2010-11-15

查看全文查看/发表评论下载PDF阅读器 HTML

DOI编号 10.7641/j.issn.1000-8152.2011.10.CCTA100471

2011,28(10):1461-1466

中文关键词视听信息融合语音识别自适应权重学习自动机隐马尔科夫模型

英文关键词 audio-visual information fusion speech recognition adaptive weights learning automata(LA) hidden Markov model

基金项目国家自然科学基金资助项目(60572153); 黑龙江省博士后基金资助项目(LBH–Z09102); 哈尔滨理工大学青年科学研究基金资助项目(2009YF015); 中央高校基本科研业务费专项资金资助项目(DUT11RC(3)54).

作者	单位	E-mail
梁冰^*	大连理工大学创新实验学院	newpek@163.com
陈德运	哈尔滨理工大学计算机科学与技术学院
程慧	哈尔滨工程大学计算机科学与技术学院

中文摘要

为了提高噪音环境中语音识别的准确性和鲁棒性, 提出了基于自适应视听信息融合的抗噪语音识别方法, 视听信息在识别过程中具有变化的权重, 动态的自适应于环境输入的信噪比. 根据信噪比和反馈的识别性能, 通过学习自动机计算视觉信息的最优权重; 根据视听信息的特征向量, 利用隐马尔科夫模型进行视听信息的模式匹配, 并根据最优权重组合视觉和声音隐马尔科夫模型的决策, 获得最终的识别结果. 实验结果表明, 在各种噪音水平下, 自适应权重比不变权重的视听信息融合的语音识别性能更优.

英文摘要

We propose the adaptive fusion of acoustic and visual information for improving the accuracy and the robustness in the speech recognition. The acoustic and visual information is involved in the recognition process with different weights, which are adaptively determined according to the signal-to-noise ratio(SNR) between the environment inputs during the process of recognition. Based on the SNR and the performance feedback, a learning automata is used for computing the adaptive weights for the visual information. A hidden Markov model is used to match the patterns of the acoustic information and the visual information. The hidden Markov model decides the final recognition results by combining the acoustic information and the visual information with optimal weights. Experiments under various noise-level conditions are performed; results show that the speech recognition based on adaptive weights surpasses the speech recognition based on fixed weights.