Abstract
With the widespread use of voice-controlling services and devices, the research for developing robust and fast systems for automatic speaker identification had accelerated. In this paper, we present a Convolutional Neural Network (CNN) architecture for text-independent automatic speaker identification. The primary purpose is to identify a speaker, among many others, using a short speech segment. Most of the current researches focus on deep CNNs, which were initially designed for computer vision tasks. Besides, most of the existing speaker identification methods require audio samples longer than 3 seconds in the query phase for achieving a high accuracy. We created a CNN architecture appropriate for voice and speech-related classification tasks. We propose an optimum model that achieves 99.5% accuracy on LibriSpeech and 90% accuracy on VoxCeleb 1 dataset using only 1-second test utterances in our experiments.
Original language | English |
---|---|
Title of host publication | Proceedings - 6th International Conference on Computer Science and Engineering, UBMK 2021 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 413-418 |
Number of pages | 6 |
ISBN (Electronic) | 9781665429085 |
DOIs | |
Publication status | Published - 2021 |
Event | 6th International Conference on Computer Science and Engineering, UBMK 2021 - Ankara, Turkey Duration: 15 Sept 2021 → 17 Sept 2021 |
Publication series
Name | Proceedings - 6th International Conference on Computer Science and Engineering, UBMK 2021 |
---|
Conference
Conference | 6th International Conference on Computer Science and Engineering, UBMK 2021 |
---|---|
Country/Territory | Turkey |
City | Ankara |
Period | 15/09/21 → 17/09/21 |
Bibliographical note
Publisher Copyright:© 2021 IEEE
Funding
This work has been supported by Arcelik ITU R&D Center and Scientific Project Unit (BAP) of Istanbul Technical University, project number: MOA-2019-42321. The authors thank to Cagri Aslanbas, Berna Erden, Pinar Baki, Ugur Halatoglu and Baris Bayram for their fruitful discussions.
Funders | Funder number |
---|---|
Arcelik ITU R&D Center and Scientific Project Unit | |
British Association for Psychopharmacology | |
Istanbul Teknik Üniversitesi | MOA-2019-42321 |
Keywords
- Convolutional Neural Networks
- Deep Learning
- Signal Processing
- Speaker Identifìcation