An audio-visual particle filter for speaker tracking on the CLEAR'06 evaluation dataset

Kai Nickel*, Tobias Gehrig, Hazim K. Ekenel, John McDonough, Rainer Stiefelhagen

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Citations (Scopus)

Abstract

We present an approach for tracking a lecturer during the course of his speech. We use features from multiple cameras and microphones, and process them in a joint particle filter framework. The filter performs sampled projections of 3D location hypotheses and scores them using features from both audio and video. On the video side, the features are based on foreground segmentation, multi-view face detection and upper body detection. On the audio side, the time delays of arrival between pairs of microphones are estimated with a generalized cross correlation function. In the CLEAR'06 evaluation, the system yielded a tracking accuracy (MOTA) of 71% for video-only, 55% for audio-only and 90% for combined audio-visual tracking.

Original languageEnglish
Title of host publicationMultimodal Technologies for Perception of Humans - First International Evaluation Workshop on Classification of Events, Activities and Relationships, CLEAR 2006 Revised Selected Papers
PublisherSpringer Verlag
Pages69-80
Number of pages12
ISBN (Print)9783540695677
DOIs
Publication statusPublished - 2007
Externally publishedYes
Event1st International Evaluation Workshop on Classification of Events, Activities and Relationships, CLEAR 2006 - Southhampton, United Kingdom
Duration: 6 Apr 20067 Apr 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4122 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference1st International Evaluation Workshop on Classification of Events, Activities and Relationships, CLEAR 2006
Country/TerritoryUnited Kingdom
CitySouthhampton
Period6/04/067/04/06

Fingerprint

Dive into the research topics of 'An audio-visual particle filter for speaker tracking on the CLEAR'06 evaluation dataset'. Together they form a unique fingerprint.

Cite this