Zaheed Ahmed

4.9SEApr 24

Quality-Driven Selective Mutation for Deep Learning

Zaheed Ahmed, Emmanuel Charleson Dapaah, Philip Makedonski et al.

Mutants support testing and debugging in two roles: (i) as test goals and (ii) as substitutes for real faults. Hard-to-kill mutants provide better guidance for test improvement, while realism is essential when mutants are used to simulate real bugs. Building on these roles, selective mutation for deep learning (DL) aims to reduce the cost of mutant generation and execution by choosing operator configurations that yield resistant and realistic mutants. However, the DL literature lacks a unified measure that captures both aspects. This study presents a probabilistic framework to quantify mutant quality along two complementary axes: resistance and realism. Resistance adapts the classical notion of hard-to-kill mutants to the DL setting using statistical killing probabilities, while realism is measured via the generalized Jaccard similarity between mutant and real-fault detectability patterns. The framework enables ranking and filtering of low-quality mutation-operator configurations without assuming a specific use case. We empirically evaluate the approach on four datasets of real DL faults. Three datasets (CleanML, DeepFD, and DeepLocalize) are used to estimate and select high-quality operator configurations, and the held-out defect4ML dataset is used for validation. Results show that quality-driven selection reduces the number of generated mutants by up to 55.6% while preserving typical levels of resistance and realism under baseline-aligned selection thresholds. These findings confirm that dual-objective selection can lower cost without compromising the usefulness of mutants for either role.

SEApr 6, 2021

A new perspective on the competent programmer hypothesis through the reproduction of bugs with repeated mutations

Zaheed Ahmed, Eike Stein, Steffen Herbold et al.

The competent programmer hypothesis states that most programmers are competent enough to create correct or almost correct source code. Because this implies that bugs should usually manifest through small variations of the correct code, the competent programmer hypothesis is one of the fundamental assumptions of mutation testing. Unfortunately, it is still unclear if the competent programmer hypothesis holds and past research presents contradictory claims. Within this article, we provide a new perspective on the competent programmer hypothesis and its relation to mutation testing. We try to re-create real-world bugs through chains of mutations to understand if there is a direct link between mutation testing and bugs. The lengths of these paths help us to understand if the source code is really almost correct, or if large variations are required. Our results indicate that while the competent programmer hypothesis seems to be true, mutation testing is missing important operators to generate representative real-world bugs.

Zaheed Ahmed

2 Papers