A comparison study on ensemble strategies and feature sets for sentiment analysis

Deniz Aldogan*, Yusuf Yaslan

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

This paper is devoted to the comparison of different common base and ensemble classifiers for sentiment classification of reviews. It is also aimed to generate different feature sets and to observe their contribution to the classification accuracy. In detail, these feature sets are formed in an hierarchical manner, which is accomplished by first forming part-of-speech (POS) based word groups and then utilizing feature frequencies, SentiWordNet scores and their combination to obtain feature sets. In addition, several common base classifiers, namely Multinominal Naive Bayes (MNB), Support Vector Machine (SVM), Voted Perceptron (VP), K-Nearest Neighbor (k-NN), as well as common ensemble strategies, Random Forests (RFs), Stacking and Random Subspace (RSS) are each tested on the generated feature sets. Also, the Behavior-Knowledge Space (BKS) method has been derived to be applied on the set of outcomes for different algorithm and feature set combinations. Furthermore, a probability based meta-classifier technique has been tested on this set of outcomes. Finally, Information Gain (IG) feature selection technique has been applied to reduce the feature spaces. The experiments are conducted on a widely used movie review dataset and an equally common multi-domain review dataset. The results indicate that the probabilistic ensemble method generally gives comparatively better results than the other algorithms tested on the chosen datasets and that IG method can be utilized to save computational time while maintaining allowable accuracy.

Original languageEnglish
Title of host publicationInformation Sciences and Systems 2015 - 30th International Symposium on Computer and Information Sciences, ISCIS 2015
EditorsOmer H. Abdelrahman, Gokce Gorbil, Ricardo Lent, Erol Gelenbe
PublisherSpringer Verlag
Pages359-370
Number of pages12
ISBN (Print)9783319226347
DOIs
Publication statusPublished - 2016
Event30th International Symposium on Computer and Information Sciences, ISCIS 2015 - London, United Kingdom
Duration: 21 Sept 201524 Sept 2015

Publication series

NameLecture Notes in Electrical Engineering
Volume363
ISSN (Print)1876-1100
ISSN (Electronic)1876-1119

Conference

Conference30th International Symposium on Computer and Information Sciences, ISCIS 2015
Country/TerritoryUnited Kingdom
CityLondon
Period21/09/1524/09/15

Bibliographical note

Publisher Copyright:
© Springer International Publishing Switzerland 2016.

Keywords

  • Ensemble algorithms
  • Machine learning
  • Sentiment analysis
  • Text classification

Fingerprint

Dive into the research topics of 'A comparison study on ensemble strategies and feature sets for sentiment analysis'. Together they form a unique fingerprint.

Cite this