TY - JOUR
T1 - Investigating the performance of personalized models for software defect prediction
AU - Eken, Beyza
AU - Tosun, Ayse
N1 - Publisher Copyright:
© 2021 Elsevier Inc.
PY - 2021/11
Y1 - 2021/11
N2 - Software defect predictors exploring developer perspective reveal that code changes made by separate developers tend to have different defect patterns. Personalized defect prediction also contributes to this view and gives promising results. We aim to investigate the performance of personalized defect predictors compared to those of traditional models. We conduct an empirical study on six open-source projects for 222 developers. Personalized and traditional defect predictors are built utilizing two algorithms and cross-validation on the historical commit data, and assessed via seven performance measures and statistical tests. Our results show that personalized models (PMs) achieve an increase of up to 24% in recall for 83% of developers, while causing higher false alarm rates for 77% of developers. PMs are better for those developers who contribute to the modules with many prior contributors. Although size metrics contribute to the performance of the majority of the PMs, they significantly differ in terms of information gained from experience, diffusion and history metrics, respectively. The decision of whether a PM should be chosen over a traditional model depends on a set of factors, i.e., selected algorithm, model validation strategy or performance measures, and hence, PM performance significantly differs regarding these factors.
AB - Software defect predictors exploring developer perspective reveal that code changes made by separate developers tend to have different defect patterns. Personalized defect prediction also contributes to this view and gives promising results. We aim to investigate the performance of personalized defect predictors compared to those of traditional models. We conduct an empirical study on six open-source projects for 222 developers. Personalized and traditional defect predictors are built utilizing two algorithms and cross-validation on the historical commit data, and assessed via seven performance measures and statistical tests. Our results show that personalized models (PMs) achieve an increase of up to 24% in recall for 83% of developers, while causing higher false alarm rates for 77% of developers. PMs are better for those developers who contribute to the modules with many prior contributors. Although size metrics contribute to the performance of the majority of the PMs, they significantly differ in terms of information gained from experience, diffusion and history metrics, respectively. The decision of whether a PM should be chosen over a traditional model depends on a set of factors, i.e., selected algorithm, model validation strategy or performance measures, and hence, PM performance significantly differs regarding these factors.
KW - Change-level
KW - Defect prediction
KW - Personalized
KW - Software recommendation systems
UR - http://www.scopus.com/inward/record.url?scp=85110405837&partnerID=8YFLogxK
U2 - 10.1016/j.jss.2021.111038
DO - 10.1016/j.jss.2021.111038
M3 - Article
AN - SCOPUS:85110405837
SN - 0164-1212
VL - 181
JO - Journal of Systems and Software
JF - Journal of Systems and Software
M1 - 111038
ER -