TY - JOUR
T1 - Whole body motion noise cancellation of a robot for improved automatic speech recognition
AU - Ince, Gökhan
AU - Nakadai, Kazuhiro
AU - Rodemann, Tobias
AU - Tsujino, Hiroshi
AU - Imura, Jun Ichi
PY - 2011
Y1 - 2011
N2 - The motors of a robot produce ego-motion noise that degrades the quality of recorded sounds. This paper describes an architecture that enhances the capability of a robot to perform automatic speech recognition (ASR) even as the entire body of the robot moves. The architecture consists of three blocks: (i) a multichannel noise reduction block, consisting of microphone-array-based sound localization, geometric source separation and post-filtering, (ii) a single-channel template subtraction block and (iii) an ASR block. As the first step of our analysis strategy, we divided the whole-body motion noise problem into three subdomains of arm, leg and head motion noise, according to their intensity levels and spatial location. Subsequently, by following a synthesis-by-analysis approach, we determined the best method for suppressing each type of ego-motion noise. Finally, we proposed to utilize a control module in our ASR framework; this module was designed to make decisions based on instantaneously detected motions, allowing it to switch to the most appropriate method for the current type of noise. This proposed system resulted in improvements of up to 50 points in word correct rates compared with results obtained by single microphone recognition of arm, leg and head motions.
AB - The motors of a robot produce ego-motion noise that degrades the quality of recorded sounds. This paper describes an architecture that enhances the capability of a robot to perform automatic speech recognition (ASR) even as the entire body of the robot moves. The architecture consists of three blocks: (i) a multichannel noise reduction block, consisting of microphone-array-based sound localization, geometric source separation and post-filtering, (ii) a single-channel template subtraction block and (iii) an ASR block. As the first step of our analysis strategy, we divided the whole-body motion noise problem into three subdomains of arm, leg and head motion noise, according to their intensity levels and spatial location. Subsequently, by following a synthesis-by-analysis approach, we determined the best method for suppressing each type of ego-motion noise. Finally, we proposed to utilize a control module in our ASR framework; this module was designed to make decisions based on instantaneously detected motions, allowing it to switch to the most appropriate method for the current type of noise. This proposed system resulted in improvements of up to 50 points in word correct rates compared with results obtained by single microphone recognition of arm, leg and head motions.
KW - Robot audition
KW - automatic speech recognition
KW - noise reduction
KW - template subtraction
KW - whole-body motion noise
UR - https://www.scopus.com/pages/publications/79960736628
U2 - 10.1163/016918611X579448
DO - 10.1163/016918611X579448
M3 - Article
AN - SCOPUS:79960736628
SN - 0169-1864
VL - 25
SP - 1405
EP - 1426
JO - Advanced Robotics
JF - Advanced Robotics
IS - 11
ER -