Özet
This work proposes to predict the tags assigned for the posts on Stack Overflow platform. The raw data was obtained from the stackexchange.com including more than 50K posts and their associated tags given by the users. The posts' questions and titles are pre-processed, and the sentences in the posts are further transformed into features via Latent Dirichlet Allocation. The problem is a multi-class and multi-label classification and hence, we propose 1) one-against-all models for 15 most popularly used tags, and 2) a combined multi-tag classifier for finding the top K tags for a single post. Three algorithms are used to train the one-against-all classifiers to decide to what extent a post belongs to a tag. The probabilities of each post belonging to a tag are then combined to give the results of the multi-tag classifier with the best performing algorithm. The performance is compared with a baseline approach (kNN). Our multi-tag classifier achieves 55% recall and 39% F1-score.
Orijinal dil | İngilizce |
---|---|
Ana bilgisayar yayını başlığı | Proceedings - 2020 IEEE/ACM 42nd International Conference on Software Engineering Workshops, ICSEW 2020 |
Yayınlayan | Association for Computing Machinery, Inc |
Sayfalar | 489-493 |
Sayfa sayısı | 5 |
ISBN (Elektronik) | 9781450379632 |
DOI'lar | |
Yayın durumu | Yayınlandı - 27 Haz 2020 |
Etkinlik | 42nd IEEE/ACM International Conference on Software Engineering Workshops, ICSEW 2020 - Seoul, Korea, Republic of Süre: 27 Haz 2020 → 19 Tem 2020 |
Yayın serisi
Adı | Proceedings - 2020 IEEE/ACM 42nd International Conference on Software Engineering Workshops, ICSEW 2020 |
---|
???event.eventtypes.event.conference???
???event.eventtypes.event.conference??? | 42nd IEEE/ACM International Conference on Software Engineering Workshops, ICSEW 2020 |
---|---|
Ülke/Bölge | Korea, Republic of |
Şehir | Seoul |
Periyot | 27/06/20 → 19/07/20 |
Bibliyografik not
Publisher Copyright:© 2020 ACM.