Abstract
Automatic emotion recognition in the wild video datasets is a very challenging problem because of the inter-class similarities among different facial expressions and large intraclass variabilities due to the significant changes in illumination, pose, scene, and expression. In this paper, we present our proposed method for video-based emotion recognition in the EmotiW 2016 challenge. The task considers the unconstrained emotion recognition problem by training from short video clips extracted from movies and testing on short movie clips and spontaneous video clips of the reality TV data. Four different methods are employed to extract both static and dynamic emotion representations from the videos. First, local binary patterns of three orthogonal planes are used to describe spatiotemporal features of the video frames. Second, principal component analysis is applied to the image patches in a two-stage convolutional network to learn weights and extract facial features from the aligned faces. Third, the deep convolutional neural network model of VGGFace is deployed to extract deep facial representations from aligned faces. Fourth, a bag of visual words is computed based on dense scale-invariant feature transform descriptors from aligned face images to form hand-crafted representations. Support vector machines are then utilized to train and classify the obtained spatiotemporal representations and facial features. Finally, score-level fusion is applied to combine the classification results and predict the emotion labels of the video clips. The results show that the proposed combined method has outperformed all the utilized techniques with the overall validations and test accuracies of 43.13% and 40.13%, respectively. This system, is relatively a good classifier in Happy and Angry emotion categories and is unsuccessful in detecting Surprise, Disgust, and Fear.
Original language | English |
---|---|
Title of host publication | ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction |
Editors | Catherine Pelachaud, Yukiko I. Nakano, Toyoaki Nishida, Carlos Busso, Louis-Philippe Morency, Elisabeth Andre |
Publisher | Association for Computing Machinery, Inc |
Pages | 514-521 |
Number of pages | 8 |
ISBN (Electronic) | 9781450345569 |
DOIs | |
Publication status | Published - 31 Oct 2016 |
Event | 18th ACM International Conference on Multimodal Interaction, ICMI 2016 - Tokyo, Japan Duration: 12 Nov 2016 → 16 Nov 2016 |
Publication series
Name | ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction |
---|
Conference
Conference | 18th ACM International Conference on Multimodal Interaction, ICMI 2016 |
---|---|
Country/Territory | Japan |
City | Tokyo |
Period | 12/11/16 → 16/11/16 |
Bibliographical note
Publisher Copyright:© 2016 ACM.
Keywords
- Automatic emotion recognition
- Convolutional neural network
- Local binary patterns
- Principal component analysis
- Scale-invariant feature transform
- Support vector machines