Improving Low Resource Turkish Speech Recognition with Data Augmentation and TTS

Ramazan Gokay, Hulya Yalcin

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

19 Citations (Scopus)

Abstract

One of the major problems faced by speech recognition researchers is the lack of data. In this paper, our objective is to compare alternative solutions to lack of data. Some experiments are conducted with very limited training data to see the effects of data augmentation and speech synthesis on speech recognition. Speed and volume perturbations are applied in this study. Besides data augmentation, synthetic speech is generated by using two different speech synthesis methods. In first speech synthesis approach, Google Translate Text to Speech (gTTS) is used as speech synthesizer. In second speech synthesis approach, an end-to-end Turkish TTS system is trained by us. Finally, we examined the effects of all these alternative methods on speech recognition for low resource languages. Our results demonstrate that some data augmentation or speech synthesis techniques work well to improve speech recognition for low resource languages. In this study, 14.8% relative Word Error Ratio (WER) improvement is obtained by using combination of augmented and synthetic data.

Original languageEnglish
Title of host publication16th International Multi-Conference on Systems, Signals and Devices, SSD 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages357-360
Number of pages4
ISBN (Electronic)9781728118208
DOIs
Publication statusPublished - Mar 2019
Event16th International Multi-Conference on Systems, Signals and Devices, SSD 2019 - Istanbul, Turkey
Duration: 21 Mar 201924 Mar 2019

Publication series

Name16th International Multi-Conference on Systems, Signals and Devices, SSD 2019

Conference

Conference16th International Multi-Conference on Systems, Signals and Devices, SSD 2019
Country/TerritoryTurkey
CityIstanbul
Period21/03/1924/03/19

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

Keywords

  • data augmentation
  • low resource languages
  • speech recognition
  • speech synthesis

Fingerprint

Dive into the research topics of 'Improving Low Resource Turkish Speech Recognition with Data Augmentation and TTS'. Together they form a unique fingerprint.

Cite this