Abstract
This study aims to understand non-linear relations in Turkish textual contents to predict their topic with the help of machine learning models and to discuss contributions of the models to compliance with Personal Data Protection Rule (PDPR). Since the exponential growth of concerns in personal data processing, it has been a necessity to know the topic of the textual contents to interpret their usage in which environment they are located. The topic of the document is a piece of inclusive information presented together with all the data in its content, by this reason the categories of personal data defined in the PDPR may vary due to semantic impacts from its environment in which it is being processed. Many experiments are conducted with the fasttext model employing logistic regression and deep bidirectional transformers (BERT) models having attention layers. Model performances and statistical post-hoc test results on the model predictions are analyzed, then model deployment together with the industrial usage are discussed. As a result, a fasttext model having a macro-averaged F1-measure of 94.73 ± 0.67% is created and integrated into production efficiently.
Original language | English |
---|---|
Title of host publication | Proceedings - 6th International Conference on Computer Science and Engineering, UBMK 2021 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 292-297 |
Number of pages | 6 |
ISBN (Electronic) | 9781665429085 |
DOIs | |
Publication status | Published - 2021 |
Event | 6th International Conference on Computer Science and Engineering, UBMK 2021 - Ankara, Turkey Duration: 15 Sept 2021 → 17 Sept 2021 |
Publication series
Name | Proceedings - 6th International Conference on Computer Science and Engineering, UBMK 2021 |
---|
Conference
Conference | 6th International Conference on Computer Science and Engineering, UBMK 2021 |
---|---|
Country/Territory | Turkey |
City | Ankara |
Period | 15/09/21 → 17/09/21 |
Bibliographical note
Publisher Copyright:© 2021 IEEE
Keywords
- Bidirectional encoder representations
- Fasttext
- Personal data processing
- Topic prediction