Exploring the power of supervised learning methods for company name disambiguation in microblog posts

Esma Nafiye Polat*, Ali Çakmak, Rabia Nuray Turan

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Twitter is an online social networking website where people can post short messages on any subject, and these messages become visible to other users. Users intentionally express their opinions about companies or products via microblogging texts. Analyzing such messages might help explore what customers think about company products, or what the broad feelings of customers are. Identifying tweets referring to products and companies is becoming an important tool recently. However, company names are often vague. Hence, the first step is to locate the messages that are relevant to a company. In this paper, we present a number of supervised learning techniques to decide whether a given tweet is about a company, e.g., whether a message containing the term ‘amazon’is related to the company Amazon Inc. or not. Solving this task is challenging in comparison to the classical classification process. The main difficulty with this problem is that tweets and company names include limited information. To make this task tractable, external resources are used to get richer data about a company. More specifically, we generate several profiles for each organization, which contain richer information. Then we perform feature extraction to obtain both numerical and categorical features and we do feature selection to identify the most relevant attributes with our task. Finally, we train several supervised classifiers. Our constructed classifiers and carefully selected features provide high accuracy on the WePS-3 dataset. Our results show considerable improvement of accuracy by 11% over baseline approaches.

Original languageEnglish
Pages (from-to)2400-2415
Number of pages16
JournalTurkish Journal of Electrical Engineering and Computer Sciences
Volume28
Issue number5
DOIs
Publication statusPublished - 2020

Bibliographical note

Publisher Copyright:
© TÜBİTAK

Keywords

  • Entity resolution
  • Microblogs
  • Name disambiguation
  • Supervised classification
  • Text processing

Fingerprint

Dive into the research topics of 'Exploring the power of supervised learning methods for company name disambiguation in microblog posts'. Together they form a unique fingerprint.

Cite this