Deep Learning Based Topic Classification for Sensitivity Assignment to Personal Data

Apdullah Yayık, Hasan Apik, Ayşe Tosun

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This study aims to understand non-linear relations in Turkish textual contents to predict their topic with the help of machine learning models and to discuss contributions of the models to compliance with Personal Data Protection Rule (PDPR). Since the exponential growth of concerns in personal data processing, it has been a necessity to know the topic of the textual contents to interpret their usage in which environment they are located. The topic of the document is a piece of inclusive information presented together with all the data in its content, by this reason the categories of personal data defined in the PDPR may vary due to semantic impacts from its environment in which it is being processed. Many experiments are conducted with the fasttext model employing logistic regression and deep bidirectional transformers (BERT) models having attention layers. Model performances and statistical post-hoc test results on the model predictions are analyzed, then model deployment together with the industrial usage are discussed. As a result, a fasttext model having a macro-averaged F1-measure of 94.73 ± 0.67% is created and integrated into production efficiently.

Original languageEnglish
Title of host publicationProceedings - 6th International Conference on Computer Science and Engineering, UBMK 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages292-297
Number of pages6
ISBN (Electronic)9781665429085
DOIs
Publication statusPublished - 2021
Event6th International Conference on Computer Science and Engineering, UBMK 2021 - Ankara, Turkey
Duration: 15 Sept 202117 Sept 2021

Publication series

NameProceedings - 6th International Conference on Computer Science and Engineering, UBMK 2021

Conference

Conference6th International Conference on Computer Science and Engineering, UBMK 2021
Country/TerritoryTurkey
CityAnkara
Period15/09/2117/09/21

Bibliographical note

Publisher Copyright:
© 2021 IEEE

Keywords

  • Bidirectional encoder representations
  • Fasttext
  • Personal data processing
  • Topic prediction

Fingerprint

Dive into the research topics of 'Deep Learning Based Topic Classification for Sensitivity Assignment to Personal Data'. Together they form a unique fingerprint.

Cite this