Ana gezinime geç Aramaya geç Ana içeriğe geç

Empirical evaluation of the effects of mixed project data on learning defect predictors

  • University of Oulu
  • Toronto Metropolitan University

Araştırma sonucu: Dergiye katkıMakalebilirkişi

114 Atıf (Scopus)

Özet

Context: Defect prediction research mostly focus on optimizing the performance of models that are constructed for isolated projects (i.e. within project (WP)) through retrospective analyses. On the other hand, recent studies try to utilize data across projects (i.e. cross project (CP)) for building defect prediction models for new projects. There are no cases where the combination of within and cross (i.e. mixed) project data are used together. Objective: Our goal is to investigate the merits of using mixed project data for binary defect prediction. Specifically, we want to check whether it is feasible, in terms of defect detection performance, to use data from other projects for the cases (i) when there is an existing within project history and (ii) when there are limited within project data. Method: We use data from 73 versions of 41 projects that are publicly available. We simulate the two above-mentioned cases, and compare the performances of naive Bayes classifiers by using within project data vs. mixed project data. Results: For the first case, we find that the performance of mixed project predictors significantly improves over full within project predictors (p-value < 0.001), however the effect size is small (Hedges′ g = 0.25). For the second case, we found that mixed project predictors are comparable to full within project predictors, using only 10% of available within project data (p-value = 0.002, g = 0.17). Conclusion: We conclude that the extra effort associated with collecting data from other projects is not feasible in terms of practical performance improvement when there is already an established within project defect predictor using full project history. However, when there is limited project history, e.g. early phases of development, mixed project predictions are justifiable as they perform as good as full within project models.

Orijinal dilİngilizce
Sayfa (başlangıç-bitiş)1101-1118
Sayfa sayısı18
DergiInformation and Software Technology
Hacim55
Basın numarası6
DOI'lar
Yayın durumuYayınlandı - Haz 2013
Harici olarak yayınlandıEvet

Finansman

This research is supported in part by (i) TEKES under Cloud-SW Project and the Academy of Finland with Grant Decision No. 260871 (in Finland), (ii) NSERC Discovery Grant No. 402003-2012 (in Canada), and (iii) Turkish State Planning Organization (DPT) under the Project Number 2007K120610 (in Turkey). This work has been initiated when Dr. Ayşe Tosun Mısırlı was with the Dept. of Computer Engineering at Boğaziçi University and completed when she moved to her current position. Authors would like to thank the anonymous reviewers for their insightful and constructive comments which have significantly improved the manuscript.

FinansörlerFinansör numarası
DPT2007K120610
Turkish State Planning Organization
Natural Sciences and Engineering Research Council of Canada402003-2012
Academy of Finland260871

    Parmak izi

    Empirical evaluation of the effects of mixed project data on learning defect predictors' araştırma başlıklarına git. Birlikte benzersiz bir parmak izi oluştururlar.

    Alıntı Yap