Ana gezinime geç Aramaya geç Ana içeriğe geç

Exploring the power of supervised learning methods for company name disambiguation in microblog posts

  • Esma Nafiye Polat*
  • , Ali Çakmak
  • , Rabia Nuray Turan
  • *Bu çalışma için yazışmadan sorumlu yazar
  • Kuveyt Türk Participation Bank
  • Istanbul Technical University
  • Interos Inc.

Araştırma sonucu: Dergiye katkıMakalebilirkişi

Özet

Twitter is an online social networking website where people can post short messages on any subject, and these messages become visible to other users. Users intentionally express their opinions about companies or products via microblogging texts. Analyzing such messages might help explore what customers think about company products, or what the broad feelings of customers are. Identifying tweets referring to products and companies is becoming an important tool recently. However, company names are often vague. Hence, the first step is to locate the messages that are relevant to a company. In this paper, we present a number of supervised learning techniques to decide whether a given tweet is about a company, e.g., whether a message containing the term ‘amazon’is related to the company Amazon Inc. or not. Solving this task is challenging in comparison to the classical classification process. The main difficulty with this problem is that tweets and company names include limited information. To make this task tractable, external resources are used to get richer data about a company. More specifically, we generate several profiles for each organization, which contain richer information. Then we perform feature extraction to obtain both numerical and categorical features and we do feature selection to identify the most relevant attributes with our task. Finally, we train several supervised classifiers. Our constructed classifiers and carefully selected features provide high accuracy on the WePS-3 dataset. Our results show considerable improvement of accuracy by 11% over baseline approaches.

Orijinal dilİngilizce
Sayfa (başlangıç-bitiş)2400-2415
Sayfa sayısı16
DergiTurkish Journal of Electrical Engineering and Computer Sciences
Hacim28
Basın numarası5
DOI'lar
Yayın durumuYayınlandı - 2020

Bibliyografik not

Publisher Copyright:
© TÜBİTAK

Parmak izi

Exploring the power of supervised learning methods for company name disambiguation in microblog posts' araştırma başlıklarına git. Birlikte benzersiz bir parmak izi oluştururlar.

Alıntı Yap