Ana gezinime geç Aramaya geç Ana içeriğe geç

Landmark-Based Fast LIP Reading with CTC Loss

  • Oguz Ali Arslan
  • , Doruk Uzgun
  • , Batuhan Cengiz
  • , Cihan Topal
  • Istanbul Technical University

Araştırma sonucu: Kitap/Rapor/Konferans Bildirisinde BölümKonferans katkısıbilirkişi

Özet

Lip reading is a challenging visual recognition task that typically requires substantial computational resources. In this study, we propose a lightweight and efficient approach to visual speech recognition by leveraging facial landmark coordinates instead of raw video input. By representing full facial movements through these coordinates, our method significantly reduces computational complexity while retaining essential visual information. Our pipeline begins with face detection and landmark extraction for each batch of frames representing a single word. The extracted landmarks are organized into an input matrix, which is then fed into a spatio-temporal neural network. To model temporal dynamics and perform word recognition, we employ Connectionist Temporal Classification (CTC) loss. We employ a compact spatiotemporal CNN-RNN network trained with Connectionist Temporal Classification (CTC) loss-allowing alignment of variable-length inputs. Experiments on the MIRACL-VC1 dataset demonstrate that our CTC-based model achieves 9 3. 3 3 % word-level accuracy with significantly reduced inference time. The proposed method delivers a highly efficient lip-reading pipeline, ideal for real-time or edgedevice deployment.

Orijinal dilİngilizce
Ana bilgisayar yayını başlığı2025 14th International Conference on Image Processing, Theory, Tools and Applications, IPTA 2025
YayınlayanInstitute of Electrical and Electronics Engineers Inc.
ISBN (Elektronik)9781665457392
DOI'lar
Yayın durumuYayınlandı - 2025
Etkinlik14th International Conference on Image Processing, Theory, Tools and Applications, IPTA 2025 - Istanbul, Türkiye
Süre: 13 Eki 202516 Eki 2025

Yayın serisi

Adı2025 14th International Conference on Image Processing, Theory, Tools and Applications, IPTA 2025

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???14th International Conference on Image Processing, Theory, Tools and Applications, IPTA 2025
Ülke/BölgeTürkiye
ŞehirIstanbul
Periyot13/10/2516/10/25

Bibliyografik not

Publisher Copyright:
© 2025 IEEE.

Parmak izi

Landmark-Based Fast LIP Reading with CTC Loss' araştırma başlıklarına git. Birlikte benzersiz bir parmak izi oluştururlar.

Alıntı Yap