SEFeb 11, 2021

Improving Fault Localization by Integrating Value and Predicate Based Causal Inference Techniques

Yigit Kucuk, Tim A. D. Henderson, Andy Podgurski

arXiv:2102.06292v159 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of improving fault localization accuracy for software developers, representing an incremental advancement by combining existing causal inference techniques.

The paper tackles the problem of statistical fault localization (SFL) being limited by correlation-based methods and confounding bias, by introducing UniVal, a technique that integrates predicate outcomes and variable values using causal inference and machine learning to more accurately estimate fault-causing effects, achieving empirical comparison on 800 program versions with real faults.

Statistical fault localization (SFL) techniques use execution profiles and success/failure information from software executions, in conjunction with statistical inference, to automatically score program elements based on how likely they are to be faulty. SFL techniques typically employ one type of profile data: either coverage data, predicate outcomes, or variable values. Most SFL techniques actually measure correlation, not causation, between profile values and success/failure, and so they are subject to confounding bias that distorts the scores they produce. This paper presents a new SFL technique, named \emph{UniVal}, that uses causal inference techniques and machine learning to integrate information about both predicate outcomes and variable values to more accurately estimate the true failure-causing effect of program statements. \emph{UniVal} was empirically compared to several coverage-based, predicate-based, and value-based SFL techniques on 800 program versions with real faults.

View on arXiv PDF

Similar