引用本文:梁冰,陈德运,程慧.自适应视听信息融合用于抗噪语音识别[J].控制理论与应用,2011,28(10):1461~1466.[点击复制]
LIANG Bing,CHEN De-yun,CHENG Hui.Adaptive fusion of acoustic and visual information in noise-robust speech recognition[J].Control Theory and Technology,2011,28(10):1461~1466.[点击复制]
自适应视听信息融合用于抗噪语音识别
Adaptive fusion of acoustic and visual information in noise-robust speech recognition
摘要点击 1934  全文点击 1487  投稿时间:2010-05-02  修订日期:2010-11-15
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/j.issn.1000-8152.2011.10.CCTA100471
  2011,28(10):1461-1466
中文关键词  视听信息融合  语音识别  自适应权重  学习自动机  隐马尔科夫模型
英文关键词  audio-visual information fusion  speech recognition  adaptive weights  learning automata(LA)  hidden Markov model
基金项目  国家自然科学基金资助项目(60572153); 黑龙江省博士后基金资助项目(LBH–Z09102); 哈尔滨理工大学青年科学研究基金资助项目(2009YF015); 中央高校基本科研业务费专项资金资助项目(DUT11RC(3)54).
作者单位E-mail
梁冰* 大连理工大学 创新实验学院 newpek@163.com 
陈德运 哈尔滨理工大学 计算机科学与技术学院  
程慧 哈尔滨工程大学 计算机科学与技术学院  
中文摘要
      为了提高噪音环境中语音识别的准确性和鲁棒性, 提出了基于自适应视听信息融合的抗噪语音识别方法, 视听信息在识别过程中具有变化的权重, 动态的自适应于环境输入的信噪比. 根据信噪比和反馈的识别性能, 通过学习自动机计算视觉信息的最优权重; 根据视听信息的特征向量, 利用隐马尔科夫模型进行视听信息的模式匹配, 并根据最优权重组合视觉和声音隐马尔科夫模型的决策, 获得最终的识别结果. 实验结果表明, 在各种噪音水平下, 自适应权重比不变权重的视听信息融合的语音识别性能更优.
英文摘要
      We propose the adaptive fusion of acoustic and visual information for improving the accuracy and the robustness in the speech recognition. The acoustic and visual information is involved in the recognition process with different weights, which are adaptively determined according to the signal-to-noise ratio(SNR) between the environment inputs during the process of recognition. Based on the SNR and the performance feedback, a learning automata is used for computing the adaptive weights for the visual information. A hidden Markov model is used to match the patterns of the acoustic information and the visual information. The hidden Markov model decides the final recognition results by combining the acoustic information and the visual information with optimal weights. Experiments under various noise-level conditions are performed; results show that the speech recognition based on adaptive weights surpasses the speech recognition based on fixed weights.