A novel perceptual feature set for audio emotion recognition

Mehmet Cenk Sezgin, Bilge Gunsel, Gunes Karabulut Kurt

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Citations (Scopus)

Abstract

We present a novel system for audio emotion recognition based on the Perceptual Evaluation of Audio Quality (PEAQ) model as described by the standard, ITU-R BS.13871 which provides a mathematical model resembling the human auditory system. The introduced feature set performs perceptual analysis in time, spectral and Bark domains thus enabling us to represent the statistics of emotional audio for arousal and valence modes with a small number of features. Unlike the existing systems, the proposed feature set learns statistical characteristic of emotional differences hence does not require data normalization to eliminate speaker or corpus dependency. Recognition performance obtained for the well known VAM and EMO-DB corpora show that the classification accuracy achieved by the proposed feature set outperforms the reported benchmarking results particularly for valence both for natural and acted emotional data.

Original languageEnglish
Title of host publication2011 IEEE International Conference on Automatic Face and Gesture Recognition and Workshops, FG 2011
Pages780-785
Number of pages6
DOIs
Publication statusPublished - 2011
Event2011 IEEE International Conference on Automatic Face and Gesture Recognition and Workshops, FG 2011 - Santa Barbara, CA, United States
Duration: 21 Mar 201125 Mar 2011

Publication series

Name2011 IEEE International Conference on Automatic Face and Gesture Recognition and Workshops, FG 2011

Conference

Conference2011 IEEE International Conference on Automatic Face and Gesture Recognition and Workshops, FG 2011
Country/TerritoryUnited States
CitySanta Barbara, CA
Period21/03/1125/03/11

Keywords

  • emotion recognition
  • PEAQ
  • Perceptual audio feature extraction

Fingerprint

Dive into the research topics of 'A novel perceptual feature set for audio emotion recognition'. Together they form a unique fingerprint.

Cite this