Özet
In this paper, we propose a neural end-to-end system for voice preserving and lip-synchronous video translation. The system is designed to combine multiple component models and produces a video of the original speaker speaking in the target language that is lip-synchronous with the target speech, yet maintains emphases in speech, voice characteristics, and face video of the original speaker. The result is a video of a speaker speaking in another language without actually knowing it. For the evaluation, we present a user study of the complete system and separate evaluations of the single components. Since there is no available dataset to evaluate our whole system, we collect a test set to evaluate our system. The results indicate that our system is able to generate convincing videos of the original speaker speaking the target language while preserving the original speaker's characteristics.
Orijinal dil | İngilizce |
---|---|
Ana bilgisayar yayını başlığı | ICASSPW 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, Proceedings |
Yayınlayan | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Elektronik) | 9798350302615 |
DOI'lar | |
Yayın durumu | Yayınlandı - 2023 |
Etkinlik | 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, ICASSPW 2023 - Rhodes Island, Greece Süre: 4 Haz 2023 → 10 Haz 2023 |
Yayın serisi
Adı | ICASSPW 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, Proceedings |
---|
???event.eventtypes.event.conference???
???event.eventtypes.event.conference??? | 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, ICASSPW 2023 |
---|---|
Ülke/Bölge | Greece |
Şehir | Rhodes Island |
Periyot | 4/06/23 → 10/06/23 |
Bibliyografik not
Publisher Copyright:© 2023 IEEE.