SENov 16, 2014

Towards Cross-Project Defect Prediction with Imbalanced Feature Sets

arXiv:1411.4228v155 citations

Originality Incremental advance

AI Analysis

This addresses a practical issue in software quality assurance for new or inactive projects, but it is incremental as it builds on existing CPDP methods.

The paper tackled cross-project defect prediction with imbalanced feature sets, proposing a distribution characteristic-based instance mapping method and showing it improves prediction performance on three public datasets.

Cross-project defect prediction (CPDP) has been deemed as an emerging technology of software quality assurance, especially in new or inactive projects, and a few improved methods have been proposed to support better defect prediction. However, the regular CPDP always assumes that the features of training and test data are all identical. Hence, very little is known about whether the method for CPDP with imbalanced feature sets (CPDP-IFS) works well. Considering the diversity of defect data sets available on the Internet as well as the high cost of labeling data, to address the issue, in this paper we proposed a simple approach according to a distribution characteristic-based instance (object class) mapping, and demonstrated the validity of our method based on three public defect data sets (i.e., PROMISE, ReLink and AEEEM). Besides, the empirical results indicate that the hybrid model composed of CPDP and CPDP-IFS does improve the prediction performance of the regular CPDP to some extent.

View on arXiv PDF

Similar