TY - GEN
T1 - An audio-visual particle filter for speaker tracking on the CLEAR'06 evaluation dataset
AU - Nickel, Kai
AU - Gehrig, Tobias
AU - Ekenel, Hazim K.
AU - McDonough, John
AU - Stiefelhagen, Rainer
PY - 2007
Y1 - 2007
N2 - We present an approach for tracking a lecturer during the course of his speech. We use features from multiple cameras and microphones, and process them in a joint particle filter framework. The filter performs sampled projections of 3D location hypotheses and scores them using features from both audio and video. On the video side, the features are based on foreground segmentation, multi-view face detection and upper body detection. On the audio side, the time delays of arrival between pairs of microphones are estimated with a generalized cross correlation function. In the CLEAR'06 evaluation, the system yielded a tracking accuracy (MOTA) of 71% for video-only, 55% for audio-only and 90% for combined audio-visual tracking.
AB - We present an approach for tracking a lecturer during the course of his speech. We use features from multiple cameras and microphones, and process them in a joint particle filter framework. The filter performs sampled projections of 3D location hypotheses and scores them using features from both audio and video. On the video side, the features are based on foreground segmentation, multi-view face detection and upper body detection. On the audio side, the time delays of arrival between pairs of microphones are estimated with a generalized cross correlation function. In the CLEAR'06 evaluation, the system yielded a tracking accuracy (MOTA) of 71% for video-only, 55% for audio-only and 90% for combined audio-visual tracking.
UR - http://www.scopus.com/inward/record.url?scp=38049178208&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-69568-4_4
DO - 10.1007/978-3-540-69568-4_4
M3 - Conference contribution
AN - SCOPUS:38049178208
SN - 9783540695677
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 69
EP - 80
BT - Multimodal Technologies for Perception of Humans - First International Evaluation Workshop on Classification of Events, Activities and Relationships, CLEAR 2006 Revised Selected Papers
PB - Springer Verlag
T2 - 1st International Evaluation Workshop on Classification of Events, Activities and Relationships, CLEAR 2006
Y2 - 6 April 2006 through 7 April 2006
ER -