Marcos Lordello Chaim

SEJun 27, 2019Code

Evaluating data-flow coverage in spectrum-based fault localization

Henrique Lemos Ribeiro, Higor Amario de Souza, Roberto Paulo de Andrioli Araujo et al.

Background: Debugging is a key task during the software development cycle. Spectrum-based Fault Localization (SFL) is a promising technique to improve and automate debugging. SFL techniques use control-flow spectra to pinpoint the most suspicious program elements. However, data-flow spectra provide more detailed information about the program execution, which may be useful for fault localization. Aims: We evaluate the effectiveness and efficiency of ten SFL ranking metrics using data-flow spectra. Method: We compare the performance of data- and control-flow spectra for SFL using 163 faults from 5 real-world open source programs, which contain from 468 to 4130 test cases. The data- and control-flow spectra types used in our evaluation are definition-use associations (DUAs) and lines, respectively. Results: Using data-flow spectra, up to 50% more faults are ranked in the top-15 positions compared to control-flow spectra. Also, most SFL ranking metrics present better effectiveness using data-flow to inspect up to the top-40 positions. The execution cost of data-flow spectra is higher than control-flow, taking from 22 seconds to less than 9 minutes. Data-flow has an average overhead of 353% for all programs, while the average overhead for control-flow is of 102%. Conclusions: The results suggest that SFL techniques can benefit from using data-flow spectra to classify faults in better positions, which may lead developers to inspect less code to find bugs. The execution cost to gather data-flow is higher compared to control-flow, but it is not prohibitive. Moreover, data-flow spectra also provide information about suspicious variables for fault localization, which may improve the developers' performance using SFL.

SEJan 15, 2021

A Data Flow Analysis Framework for Data Flow Subsumption

Marcos Lordello Chaim, Kesina Baral, Jeff Offutt

Data flow testing creates test requirements as definition-use (DU) associations, where a definition is a program location that assigns a value to a variable and a use is a location where that value is accessed. Data flow testing is expensive, largely because of the number of test requirements. Luckily, many DU-associations are redundant in the sense that if one test requirement (e.g., node, edge, DU-association) is covered, other DU-associations are guaranteed to also be covered. This relationship is called subsumption. Thus, testers can save resources by only covering DU-associations that are not subsumed by other testing requirements. In this work, we formally describe the Data Flow Subsumption Framework (DSF) conceived to tackle the data flow subsumption problem. We show that DFS is a distributive data flow analysis framework which allows efficient iterative algorithms to find the Meet-Over-All-Paths (MOP) solution for DSF transfer functions. The MOP solution implies that the results at a point $p$ are valid for all paths that reach $p$. We also present an algorithm, called Subsumption Algorithm (SA), that uses DSF transfer functions and iterative algorithms to find the local DU-associations-node subsumption; that is, the set of DU-associations that are covered whenever a node $n$ is toured by a test. A proof of SA's correctness is presented and its complexity is analyzed.

Marcos Lordello Chaim

2 Papers