Visual speech recognition using PCA networks and LSTMs in a tandem GMM-HMM system

Marina Zimmermann*, Mostafa Mehdipour Ghazi, Hazım Kemal Ekenel, Jean Philippe Thiran

*Bu çalışma için yazışmadan sorumlu yazar

Araştırma sonucu: Kitap/Rapor/Konferans Bildirisinde BölümKonferans katkısıbilirkişi

8 Atıf (Scopus)

Özet

Automatic visual speech recognition is an interesting problem in pattern recognition especially when audio data is noisy or not readily available. It is also a very challenging task mainly because of the lower amount of information in the visual articulations compared to the audible utterance. In this work, principle component analysis is applied to the image patches — extracted from the video data — to learn the weights of a two-stage convolutional network. Block histograms are then extracted as the unsupervised learning features. These features are employed to learn a recurrent neural network with a set of long short-term memory cells to obtain spatiotemporal features. Finally, the obtained features are used in a tandem GMM-HMM system for speech recognition. Our results show that the proposed method has outperformed the baseline techniques applied to the OuluVS2 audiovisual database for phrase recognition with the frontal view cross-validation and testing sentence correctness reaching 79% and 73%, respectively, as compared to the baseline of 74% on cross-validation.

Orijinal dilİngilizce
Ana bilgisayar yayını başlığıComputer Vision - ACCV 2016 Workshops, ACCV 2016 International Workshops, Revised Selected Papers
EditörlerKai-Kuang Ma, Jiwen Lu, Chu-Song Chen
YayınlayanSpringer Verlag
Sayfalar264-276
Sayfa sayısı13
ISBN (Basılı)9783319544267
DOI'lar
Yayın durumuYayınlandı - 2017
Etkinlik13th Asian Conference on Computer Vision, ACCV 2016 - Taipei, Taiwan, Province of China
Süre: 20 Kas 201624 Kas 2016

Yayın serisi

AdıLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Hacim10117 LNCS
ISSN (Basılı)0302-9743
ISSN (Elektronik)1611-3349

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???13th Asian Conference on Computer Vision, ACCV 2016
Ülke/BölgeTaiwan, Province of China
Şehir Taipei
Periyot20/11/1624/11/16

Bibliyografik not

Publisher Copyright:
© Springer International Publishing AG 2017.

Parmak izi

Visual speech recognition using PCA networks and LSTMs in a tandem GMM-HMM system' araştırma başlıklarına git. Birlikte benzersiz bir parmak izi oluştururlar.

Alıntı Yap