Sound source separation and automatic speech recognition for moving sources

Kazuhiro Nakadai*, Hirofumi Nakajima, Gökhan Ince, Yuji Hasegawa

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Citations (Scopus)

Abstract

This paper addresses sound source separation and speech recognition for moving sound sources. Real-world applications such as robots should cope with both moving and stationary sound sources. However, most studies assume only stationary sound sources. We introduce three key techniques to cope with moving sources, that is, Adaptive Step-size control (AS), Optima Controlled Recursive Average (OCRA), and Separation Parameter Switching (SPS). We implemented a real-time robot audition system with these techniques for our humanoid robot with an 8ch microphone array by using HARK which is our open-source software for robot audition. Preliminary results show that the performance of recognition of moving sound sources improved drastically, and also the performance of the system is shown through two speech dialog scenarios which requires sound source separation and automatic speech recognition for moving sources.

Original languageEnglish
Title of host publicationIEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings
Pages976-981
Number of pages6
DOIs
Publication statusPublished - 2010
Externally publishedYes
Event23rd IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Taipei, Taiwan, Province of China
Duration: 18 Oct 201022 Oct 2010

Publication series

NameIEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings

Conference

Conference23rd IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010
Country/TerritoryTaiwan, Province of China
CityTaipei
Period18/10/1022/10/10

Fingerprint

Dive into the research topics of 'Sound source separation and automatic speech recognition for moving sources'. Together they form a unique fingerprint.

Cite this