Özet
Lip reading is a challenging visual recognition task that typically requires substantial computational resources. In this study, we propose a lightweight and efficient approach to visual speech recognition by leveraging facial landmark coordinates instead of raw video input. By representing full facial movements through these coordinates, our method significantly reduces computational complexity while retaining essential visual information. Our pipeline begins with face detection and landmark extraction for each batch of frames representing a single word. The extracted landmarks are organized into an input matrix, which is then fed into a spatio-temporal neural network. To model temporal dynamics and perform word recognition, we employ Connectionist Temporal Classification (CTC) loss. We employ a compact spatiotemporal CNN-RNN network trained with Connectionist Temporal Classification (CTC) loss-allowing alignment of variable-length inputs. Experiments on the MIRACL-VC1 dataset demonstrate that our CTC-based model achieves 9 3. 3 3 % word-level accuracy with significantly reduced inference time. The proposed method delivers a highly efficient lip-reading pipeline, ideal for real-time or edgedevice deployment.
| Orijinal dil | İngilizce |
|---|---|
| Ana bilgisayar yayını başlığı | 2025 14th International Conference on Image Processing, Theory, Tools and Applications, IPTA 2025 |
| Yayınlayan | Institute of Electrical and Electronics Engineers Inc. |
| ISBN (Elektronik) | 9781665457392 |
| DOI'lar | |
| Yayın durumu | Yayınlandı - 2025 |
| Etkinlik | 14th International Conference on Image Processing, Theory, Tools and Applications, IPTA 2025 - Istanbul, Türkiye Süre: 13 Eki 2025 → 16 Eki 2025 |
Yayın serisi
| Adı | 2025 14th International Conference on Image Processing, Theory, Tools and Applications, IPTA 2025 |
|---|
???event.eventtypes.event.conference???
| ???event.eventtypes.event.conference??? | 14th International Conference on Image Processing, Theory, Tools and Applications, IPTA 2025 |
|---|---|
| Ülke/Bölge | Türkiye |
| Şehir | Istanbul |
| Periyot | 13/10/25 → 16/10/25 |
Bibliyografik not
Publisher Copyright:© 2025 IEEE.
Parmak izi
Landmark-Based Fast LIP Reading with CTC Loss' araştırma başlıklarına git. Birlikte benzersiz bir parmak izi oluştururlar.Alıntı Yap
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver