TY - GEN
T1 - Intelligent sound source localization and its application to multimodal human tracking
AU - Nakamura, Keisuke
AU - Nakadai, Kazuhiro
AU - Asano, Futoshi
AU - Ince, Gökhan
PY - 2011
Y1 - 2011
N2 - We have assessed robust tracking of humans based on intelligent Sound Source Localization (SSL) for a robot in a real environment. SSL is fundamental for robot audition, but has three issues in a real environment: robustness against noise with high power, lack of a general framework for selective listening to sound sources, and tracking of inactive and/or noisy sound sources. To address the first issue, we extended Multiple SIgnal Classification by incorporating Generalized EigenValue Decomposition (GEVD-MUSIC) so that it can deal with high power noise and can select target sound sources. To address the second issue, we proposed Sound Source Identification (SSI) based on hierarchical gaussian mixture models and integrated it with GEVD-MUSIC to realize a selective listening function. To address the third issue, we integrated audio-visual human tracking using particle filtering. Integration of these three techniques into an intelligent human tracking system showed: 1) GEVD-MUSIC improved the noise-robustness of SSL by a signal-to-noise ratio of 5-6 dB; 2) SSI performed more than 70% in F-measure even in a noisy environment; and 3) audio-visual integration improved the average tracking error by approximately 50%.
AB - We have assessed robust tracking of humans based on intelligent Sound Source Localization (SSL) for a robot in a real environment. SSL is fundamental for robot audition, but has three issues in a real environment: robustness against noise with high power, lack of a general framework for selective listening to sound sources, and tracking of inactive and/or noisy sound sources. To address the first issue, we extended Multiple SIgnal Classification by incorporating Generalized EigenValue Decomposition (GEVD-MUSIC) so that it can deal with high power noise and can select target sound sources. To address the second issue, we proposed Sound Source Identification (SSI) based on hierarchical gaussian mixture models and integrated it with GEVD-MUSIC to realize a selective listening function. To address the third issue, we integrated audio-visual human tracking using particle filtering. Integration of these three techniques into an intelligent human tracking system showed: 1) GEVD-MUSIC improved the noise-robustness of SSL by a signal-to-noise ratio of 5-6 dB; 2) SSI performed more than 70% in F-measure even in a noisy environment; and 3) audio-visual integration improved the average tracking error by approximately 50%.
UR - http://www.scopus.com/inward/record.url?scp=84455168731&partnerID=8YFLogxK
U2 - 10.1109/IROS.2011.6048166
DO - 10.1109/IROS.2011.6048166
M3 - Conference contribution
AN - SCOPUS:84455168731
SN - 9781612844541
T3 - IEEE International Conference on Intelligent Robots and Systems
SP - 143
EP - 148
BT - IROS'11 - 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems
T2 - 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems: Celebrating 50 Years of Robotics, IROS'11
Y2 - 25 September 2011 through 30 September 2011
ER -