SEApr 5, 2021

Predicting Crash Fault Residence via Simplified Deep Forest Based on A Reduced Feature Set

Kunsong Zhao, Jin Liu, Zhou Xu, Li Li, Meng Yan, Jiaojiao Yu, Yuxuan Zhou

arXiv:2104.01768v16.41 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of reducing debugging effort for software developers by automating fault localization, though it is incremental as it builds on existing deep forest methods with feature preprocessing.

The paper tackles the problem of predicting where a crash-causing fault resides in software by proposing ConDF, a framework combining feature selection with a simplified deep forest model, which outperforms 17 baseline methods across three performance indicators on seven open-source projects.

The software inevitably encounters the crash, which will take developers a large amount of effort to find the fault causing the crash (short for crashing fault). Developing automatic methods to identify the residence of the crashing fault is a crucial activity for software quality assurance. Researchers have proposed methods to predict whether the crashing fault resides in the stack trace based on the features collected from the stack trace and faulty code, aiming at saving the debugging effort for developers. However, previous work usually neglected the feature preprocessing operation towards the crash data and only used traditional classification models. In this paper, we propose a novel crashing fault residence prediction framework, called ConDF, which consists of a consistency based feature subset selection method and a state-of-the-art deep forest model. More specifically, first, the feature selection method is used to obtain an optimal feature subset and reduce the feature dimension by reserving the representative features. Then, a simplified deep forest model is employed to build the classification model on the reduced feature set. The experiments on seven open source software projects show that our ConDF method performs significantly better than 17 baseline methods on three performance indicators.

View on arXiv PDF Code

Similar