Automatic Bug Triage using Semi-Supervised Text Classification
This work addresses the issue of insufficient labeled data for bug triage in software engineering, offering an incremental improvement over prior supervised approaches.
The paper tackles the problem of bug triage by proposing a semi-supervised text classification approach that combines naive Bayes and expectation-maximization to utilize both labeled and unlabeled bug reports, achieving higher classification accuracy than existing supervised methods on Eclipse bug reports.
In this paper, we propose a semi-supervised text classification approach for bug triage to avoid the deficiency of labeled bug reports in existing supervised approaches. This new approach combines naive Bayes classifier and expectation-maximization to take advantage of both labeled and unlabeled bug reports. This approach trains a classifier with a fraction of labeled bug reports. Then the approach iteratively labels numerous unlabeled bug reports and trains a new classifier with labels of all the bug reports. We also employ a weighted recommendation list to boost the performance by imposing the weights of multiple developers in training the classifier. Experimental results on bug reports of Eclipse show that our new approach outperforms existing supervised approaches in terms of classification accuracy.