TY - GEN
T1 - An active audition framework for auditory-driven HRI
T2 - 2012 21st IEEE International Symposium on Robot and Human Interactive Communication, RO-MAN 2012
AU - Oliveira, Joao Lobato
AU - Ince, Gokhan
AU - Nakamura, Keisuke
AU - Nakadai, Kazuhiro
AU - Okuno, Hiroshi G.
AU - Reis, Luis Paulo
AU - Gouyon, Fabien
PY - 2012
Y1 - 2012
N2 - In this paper we propose a general active audition framework for auditory-driven Human-Robot Interaction (HRI). The proposed framework simultaneously processes speech and music on-the-fly, integrates perceptual models for robot audition, and supports verbal and non-verbal interactive communication by means of (pro)active behaviors. To ensure a reliable interaction, on top of the framework a behavior decision mechanism based on active audition policies the robot's actions according to the reliability of the acoustic signals for auditory processing. To validate the framework's application to general auditory-driven HRI, we propose the implementation of an interactive robot dancing system. This system integrates three preprocessing robot audition modules: sound source localization, sound source separation, and ego noise suppression; two modules for auditory perception: live audio beat tracking and automatic speech recognition; and multi-modal behaviors for verbal and non-verbal interaction: music-driven dancing and speech-driven dialoguing. To fully assess the system, we set up experimental and interactive real-world scenarios with highly dynamic acoustic conditions, and defined a set of evaluation criteria. The experimental tests revealed accurate and robust beat tracking and speech recognition, and convincing dance beat-synchrony. The interactive sessions confirmed the fundamental role of the behavior decision mechanism for actively maintaining a robust and natural human-robot interaction.
AB - In this paper we propose a general active audition framework for auditory-driven Human-Robot Interaction (HRI). The proposed framework simultaneously processes speech and music on-the-fly, integrates perceptual models for robot audition, and supports verbal and non-verbal interactive communication by means of (pro)active behaviors. To ensure a reliable interaction, on top of the framework a behavior decision mechanism based on active audition policies the robot's actions according to the reliability of the acoustic signals for auditory processing. To validate the framework's application to general auditory-driven HRI, we propose the implementation of an interactive robot dancing system. This system integrates three preprocessing robot audition modules: sound source localization, sound source separation, and ego noise suppression; two modules for auditory perception: live audio beat tracking and automatic speech recognition; and multi-modal behaviors for verbal and non-verbal interaction: music-driven dancing and speech-driven dialoguing. To fully assess the system, we set up experimental and interactive real-world scenarios with highly dynamic acoustic conditions, and defined a set of evaluation criteria. The experimental tests revealed accurate and robust beat tracking and speech recognition, and convincing dance beat-synchrony. The interactive sessions confirmed the fundamental role of the behavior decision mechanism for actively maintaining a robust and natural human-robot interaction.
UR - http://www.scopus.com/inward/record.url?scp=84870795054&partnerID=8YFLogxK
U2 - 10.1109/ROMAN.2012.6343892
DO - 10.1109/ROMAN.2012.6343892
M3 - Conference contribution
AN - SCOPUS:84870795054
SN - 9781467346054
T3 - Proceedings - IEEE International Workshop on Robot and Human Interactive Communication
SP - 1078
EP - 1085
BT - 2012 IEEE RO-MAN
Y2 - 9 September 2012 through 13 September 2012
ER -