Audio-Driven Talking Face Generation with Stabilized Synchronization Loss

Dogucan Yaman*, Fevziye Irem Eyiokur, Leonard Bärmann, Hazım Kemal Ekenel, Alexander Waibel

*Bu çalışma için yazışmadan sorumlu yazar

Araştırma sonucu: Kitap/Rapor/Konferans Bildirisinde BölümKonferans katkısıbilirkişi

Özet

Talking face generation aims to create realistic videos with accurate lip synchronization and high visual quality, using given audio and reference video while preserving identity and visual characteristics. In this paper, we start by identifying several issues with existing synchronization learning methods. These involve unstable training, lip synchronization, and visual quality issues caused by lip-sync loss, SyncNet, and lip leaking from the identity reference. To address these issues, we first tackle the lip leaking problem by introducing a silent-lip generator, which changes the lips of the identity reference to alleviate leakage. We then introduce stabilized synchronization loss and AVSyncNet to overcome problems caused by lip-sync loss and SyncNet. Experiments show that our model outperforms state-of-the-art methods in both visual quality and lip synchronization. Comprehensive ablation studies further validate our individual contributions and their cohesive effects.

Orijinal dilİngilizce
Ana bilgisayar yayını başlığıComputer Vision – ECCV 2024 - 18th European Conference, Proceedings
EditörlerAleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol
YayınlayanSpringer Science and Business Media Deutschland GmbH
Sayfalar417-435
Sayfa sayısı19
ISBN (Basılı)9783031726545
DOI'lar
Yayın durumuYayınlandı - 2025
Etkinlik18th European Conference on Computer Vision, ECCV 2024 - Milan, Italy
Süre: 29 Eyl 20244 Eki 2024

Yayın serisi

AdıLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Hacim15077 LNCS
ISSN (Basılı)0302-9743
ISSN (Elektronik)1611-3349

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???18th European Conference on Computer Vision, ECCV 2024
Ülke/BölgeItaly
ŞehirMilan
Periyot29/09/244/10/24

Bibliyografik not

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

Parmak izi

Audio-Driven Talking Face Generation with Stabilized Synchronization Loss' araştırma başlıklarına git. Birlikte benzersiz bir parmak izi oluştururlar.

Alıntı Yap