A Methodological Analysis of Empirical Studies in Quantum Software Testing
For researchers in quantum software engineering, this work highlights the lack of standardized methodology in QST empirical studies, offering guidance to enhance reproducibility and comparability.
This paper systematically analyzes 59 empirical studies in quantum software testing (QST) from a pool of 384, identifying methodological limitations and inconsistencies across key dimensions such as test objects, baselines, and experimental setups. The authors provide recommendations to improve the design and reporting of future QST empirical studies.
In quantum software engineering (QSE), quantum software testing (QST) has attracted increasing attention as quantum software systems grow in scale and complexity. Since QST evaluates quantum programs through execution under designed test inputs, empirical studies are widely used to assess the effectiveness of testing approaches. However, the design and reporting of empirical studies in QST remain highly diverse, and a shared methodological understanding has yet to emerge, making it difficult to interpret results and compare findings across studies. This paper presents a methodological analysis of empirical studies in QST through a systematic examination of 59 primary studies identified from a literature pool of size 384. We organize our analysis around ten research questions that cover key methodological dimensions of QST empirical studies, including objects under test, baseline comparison, testing setup, experimental configuration, and tool and artifact support. Through cross-study analysis along these dimensions, we characterize current empirical practices in QST, identify recurring limitations and inconsistencies, and highlight open methodological challenges. Based on our findings, we derive insights and recommendations to inform the design, execution, and reporting of future empirical studies in QST.