Multi-talker speech recognition under ego-motion noise using missing feature theory

Gökhan Ince*, Kazuhiro Nakadai, Tobias Rodemann, Hiroshi Tsujino, Jun Ichi Imura

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

This paper presents a system that gives a mobile robot the ability to recognize target speaker's speech, even if the robot performs an action and there are multiple speakers talking in the room. Associated problems to this system are twofold: (1) While the robot is moving, the joints inevitably generate ego-motion noise due to its motors. (2) Recognizing target speech against other interfering speech signals is a difficult task. Since typical solutions to (1) and (2), motor noise suppression and sound source separation, both introduce distortion to the processed signals, the performance of automatic speech recognition (ASR) deteriorates. Instead of removing the ego-motion noise with conventional noise suppression methods, in this work, we investigate methods to eliminate the unreliable parts of the audio features that are contaminated by the ego-motion noise. For this purpose, we model masks that filter unreliable speech features based on the ratio of speech and motor noise energies. We analyze the performance of the proposed technique under various test conditions by comparing it to the performance of existing Missing Feature Theory-based ASR implementations. Finally, we propose an integration framework for two different masks that are designed to eliminate ego noise and to filter the leakage energy of interfering sound sources. We demonstrate that the proposed methods achieve a high ASR accuracy.

Original languageEnglish
Title of host publicationIEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings
Pages982-987
Number of pages6
DOIs
Publication statusPublished - 2010
Externally publishedYes
Event23rd IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Taipei, Taiwan, Province of China
Duration: 18 Oct 201022 Oct 2010

Publication series

NameIEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings

Conference

Conference23rd IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010
Country/TerritoryTaiwan, Province of China
CityTaipei
Period18/10/1022/10/10

Fingerprint

Dive into the research topics of 'Multi-talker speech recognition under ego-motion noise using missing feature theory'. Together they form a unique fingerprint.

Cite this