Analysis of preprocessing methods on classification of Turkish texts

Dilara Torunoǧlu*, Erhan Çakirman, Murat Can Ganiz, Selim Akyokuş, M. Zahid Gürbüz

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

46 Citations (Scopus)

Abstract

Preprocessing is an important task and critical step in information retrieval and text mining. The objective of this study is to analyze the effect of preprocessing methods in text classification on Turkish texts. We compiled two large datasets from Turkish newspapers using a crawler. On these compiled data sets and using two additional datasets, we perform a detailed analysis of preprocessing methods such as stemming, stopword filtering and word weighting for Turkish text classification on several different Turkish datasets. We report the results of extensive experiments.

Original languageEnglish
Title of host publicationINISTA 2011 - 2011 International Symposium on INnovations in Intelligent SysTems and Applications
Pages112-117
Number of pages6
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event2011 International Symposium on INnovations in Intelligent SysTems and Applications, INISTA 2011 - Istanbul-Kadikoy, Turkey
Duration: 15 Jun 201118 Jun 2011

Publication series

NameINISTA 2011 - 2011 International Symposium on INnovations in Intelligent SysTems and Applications

Conference

Conference2011 International Symposium on INnovations in Intelligent SysTems and Applications, INISTA 2011
Country/TerritoryTurkey
CityIstanbul-Kadikoy
Period15/06/1118/06/11

Keywords

  • Data preprocessing
  • stemming
  • stopword removal
  • Text Classification
  • Turkish Text Classification

Fingerprint

Dive into the research topics of 'Analysis of preprocessing methods on classification of Turkish texts'. Together they form a unique fingerprint.

Cite this