SEAug 28, 2018

Coincidental Correctness in the Defects4J Benchmark

arXiv:1808.09233v43 citations
AI Analysis

This addresses software testing reliability for developers, but is incremental as it extends prior work to a broader benchmark.

The study investigated coincidental correctness in the Defects4J benchmark, finding that it is prevalent and affects testing levels, with infections often nullified outside buggy methods.

Coincidental correctness (CC) arises when a defective program produces the correct output despite the fact that the defect within was exercised. Researchers have recognized the negative impact of coincidental correctness, and the authors have previously conducted a study demonstrating its prevalence in test suites. However, that study was limited to system tests and small subjects seeded with artificial defects. In this paper, we conduct a wider scope study of CC that addresses the following research questions in the context of the Defects4J benchmark: RQ1: Is CC prevalent in Defects4J? RQ2: Is CC affected by the testing levels in Defects4J? RQ3: Do CC tests induce peculiar infection paths in Defects4J? RQ4: Are the infections likely to be nullified within or outside the buggy method? ....

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes