Predicting Stack Overflow Question Tags: A Multi-Class, Multi-Label Classification

Eray Mert Kavuk, Ayse Tosun

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

This work proposes to predict the tags assigned for the posts on Stack Overflow platform. The raw data was obtained from the stackexchange.com including more than 50K posts and their associated tags given by the users. The posts' questions and titles are pre-processed, and the sentences in the posts are further transformed into features via Latent Dirichlet Allocation. The problem is a multi-class and multi-label classification and hence, we propose 1) one-against-all models for 15 most popularly used tags, and 2) a combined multi-tag classifier for finding the top K tags for a single post. Three algorithms are used to train the one-against-all classifiers to decide to what extent a post belongs to a tag. The probabilities of each post belonging to a tag are then combined to give the results of the multi-tag classifier with the best performing algorithm. The performance is compared with a baseline approach (kNN). Our multi-tag classifier achieves 55% recall and 39% F1-score.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE/ACM 42nd International Conference on Software Engineering Workshops, ICSEW 2020
PublisherAssociation for Computing Machinery, Inc
Pages489-493
Number of pages5
ISBN (Electronic)9781450379632
DOIs
Publication statusPublished - 27 Jun 2020
Event42nd IEEE/ACM International Conference on Software Engineering Workshops, ICSEW 2020 - Seoul, Korea, Republic of
Duration: 27 Jun 202019 Jul 2020

Publication series

NameProceedings - 2020 IEEE/ACM 42nd International Conference on Software Engineering Workshops, ICSEW 2020

Conference

Conference42nd IEEE/ACM International Conference on Software Engineering Workshops, ICSEW 2020
Country/TerritoryKorea, Republic of
CitySeoul
Period27/06/2019/07/20

Bibliographical note

Publisher Copyright:
© 2020 ACM.

Keywords

  • Latent Dirichlet Allocation
  • Stack Overflow
  • tag prediction

Fingerprint

Dive into the research topics of 'Predicting Stack Overflow Question Tags: A Multi-Class, Multi-Label Classification'. Together they form a unique fingerprint.

Cite this