Abstract
Synthetic data generation is one of the methods used in machine learning to increase the performance of algorithms on datasets. However, these methods do not ensure success on each dataset. In this study, it has been investigated that which type of synthetic data generation algorithms are useful in which datasets by examining the effects of SMOTE, Borderline-SMOTE and Random data generation algorithms on 33 datasets. For this, each dataset has been fully balanced as a result of synthetic data generation. In order to evaluate the results, datasets are divided into three groups as balanced, partially balanced-unbalanced and unbalanced in accordance with the unbalance ratio. The datasets formed as a result of the data generation of the algorithms and the original datasets have been trained with an ANN models and their performance has been evaluated on the test set. Experimental results have shown that adding synthetic data to the datasets with the abovementioned algorithms generally increases the success in balanced and partially balanced-unbalanced datasets, but generally does not work in unbalanced datasets. Borderline-SMOTE, which produces border samples in balanced datasets, and SMOTE in partially balanced-unbalanced datasets have been more successful.
Translated title of the contribution | When does synthetic data generation work? |
---|---|
Original language | Turkish |
Title of host publication | SIU 2021 - 29th IEEE Conference on Signal Processing and Communications Applications, Proceedings |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781665436496 |
DOIs | |
Publication status | Published - 9 Jun 2021 |
Event | 29th IEEE Conference on Signal Processing and Communications Applications, SIU 2021 - Virtual, Istanbul, Turkey Duration: 9 Jun 2021 → 11 Jun 2021 |
Publication series
Name | SIU 2021 - 29th IEEE Conference on Signal Processing and Communications Applications, Proceedings |
---|
Conference
Conference | 29th IEEE Conference on Signal Processing and Communications Applications, SIU 2021 |
---|---|
Country/Territory | Turkey |
City | Virtual, Istanbul |
Period | 9/06/21 → 11/06/21 |
Bibliographical note
Publisher Copyright:© 2021 IEEE.