TY - JOUR
T1 - Perceptual audio features for emotion detection
AU - Sezgin, Mehmet Cenk
AU - Gunsel, Bilge
AU - Kurt, Gunes Karabulut
PY - 2012
Y1 - 2012
N2 - In this article, we propose a new set of acoustic features for automatic emotion recognition from audio. The features are based on the perceptual quality metrics that are given in perceptual evaluation of audio quality known as ITU BS.1387 recommendation. Starting from the outer and middle ear models of the auditory system, we base our features on the masked perceptual loudness which defines relatively objective criteria for emotion detection. The features computed in critical bands based on the reference concept include the partial loudness of the emotional difference, emotional difference-to-perceptual mask ratio, measures of alterations of temporal envelopes, measures of harmonics of the emotional difference, the occurrence probability of emotional blocks, and perceptual bandwidth. A soft-majority voting decision rule that strengthens the conventional majority voting is proposed to assess the classifier outputs. Compared to the state-of-the-art systems including Munich Open-Source Emotion and Affect Recognition Toolkit, Hidden Markov Toolkit, and Generalized Discriminant Analysis, it is shown that the emotion recognition rates are improved between 7-16% for EMO-DB and 7-11% in VAM for "all" and "valence" tasks.
AB - In this article, we propose a new set of acoustic features for automatic emotion recognition from audio. The features are based on the perceptual quality metrics that are given in perceptual evaluation of audio quality known as ITU BS.1387 recommendation. Starting from the outer and middle ear models of the auditory system, we base our features on the masked perceptual loudness which defines relatively objective criteria for emotion detection. The features computed in critical bands based on the reference concept include the partial loudness of the emotional difference, emotional difference-to-perceptual mask ratio, measures of alterations of temporal envelopes, measures of harmonics of the emotional difference, the occurrence probability of emotional blocks, and perceptual bandwidth. A soft-majority voting decision rule that strengthens the conventional majority voting is proposed to assess the classifier outputs. Compared to the state-of-the-art systems including Munich Open-Source Emotion and Affect Recognition Toolkit, Hidden Markov Toolkit, and Generalized Discriminant Analysis, it is shown that the emotion recognition rates are improved between 7-16% for EMO-DB and 7-11% in VAM for "all" and "valence" tasks.
KW - Audio emotion recognition
KW - PEAQ
KW - Perceptual audio feature extraction
UR - http://www.scopus.com/inward/record.url?scp=84873854357&partnerID=8YFLogxK
U2 - 10.1186/1687-4722-2012-16
DO - 10.1186/1687-4722-2012-16
M3 - Article
AN - SCOPUS:84873854357
SN - 1687-4714
VL - 2012
JO - Eurasip Journal on Audio, Speech, and Music Processing
JF - Eurasip Journal on Audio, Speech, and Music Processing
IS - 1
M1 - 16
ER -