Serge Demeyer

h-index9

15papers

146citations

Novelty34%

AI Score27

Ranked #162,330 of 201,018 authors (top 81%)#2,170 in SE (top 63%)

15 Papers

SEDec 21, 2021Code

AmPyfier: Test Amplification in Python

Ebert Schoofs, Mehrdad Abdi, Serge Demeyer

Test Amplification is a method to extend handwritten tests into a more rigorous test suite covering corner cases in the system under test. Unfortunately, the current state-of-the-art for test amplification heavily relies on program analysis techniques which benefit a lot from explicit type declarations present in statically typed languages like Java and C++. In dynamically typed languages, such type declarations are not available and as a consequence test amplification has yet to find its way to programming languages like Python, Ruby and Javascript. In this paper, we present AmPyfier, a proof-of-concept tool, which brings test amplification to the dynamically typed, interpreted language Python. We evaluated this approach on 7 open-source projects, and found that AmPyfier could successfully strengthen 7 out of 10 test classes (70%). As such we demonstrate that test amplification is feasible for one of the most popular programming languages in use today.

SEApr 23, 2021Code

Comparing Mutation Coverage Against Branch Coverage in an Industrial Setting

Ali Parsai, Serge Demeyer

The state-of-the-practice in software development is driven by constant change fueled by continuous integration servers. Such constant change demands for frequent and fully automated tests capable to detect faults immediately upon project build. As the fault detection capability of the test suite becomes so important, modern software development teams continuously monitor the quality of the test suite as well. However, it appears that the state-of-the-practice is reluctant to adopt strong coverage metrics (namely mutation coverage), instead relying on weaker kinds of coverage (namely branch coverage). In this paper, we investigate three reasons that prohibit the adoption of mutation coverage in a continuous integration setting: (1) the difficulty of its integration into the build system, (2) the perception that branch coverage is "good enough", and (3) the performance overhead during the build. Our investigation is based on a case study involving four open source systems and one industrial system. We demonstrate that mutation coverage reveals additional weaknesses in the test suite compared to branch coverage and that it is able to do so with an acceptable performance overhead during project build.

SEApr 8, 2020Code

Do Null-Type Mutation Operators Help Prevent Null-Type Faults?

Ali Parsai, Serge Demeyer

The null-type is a major source of faults in Java programs, and its overuse has a severe impact on software maintenance. Unfortunately traditional mutation testing operators do not cover null-type faults by default, hence cannot be used as a preventive measure. We address this problem by designing four new mutation operators which model null-type faults explicitly. We show how these mutation operators are capable of revealing the missing tests, and we demonstrate that these mutation operators are useful in practice. For the latter, we analyze the test suites of 15 open-source projects to describe the trade-offs related to the adoption of these operators to strengthen the test suite.

SEJul 28, 2018Code

Goal-Oriented Mutation Testing with Focal Methods

Sten Vercammen, Mohammad Ghafari, Serge Demeyer et al.

Mutation testing is the state-of-the-art technique for assessing the fault-detection capacity of a test suite. Unfortunately, mutation testing consumes enormous computing resources because it runs the whole test suite for each and every injected mutant. In this paper we explore fine-grained traceability links at method level (named focal methods), to reduce the execution time of mutation testing and to verify the quality of the test cases for each individual method, instead of the usually verified overall test suite quality. Validation of our approach on the open source Apache Ant project shows a speed-up of 573.5x for the mutants located in focal methods with a quality score of 80%.

SEOct 5, 2016Code

A Model to Estimate First-Order Mutation Coverage from Higher-Order Mutation Coverage

Ali Parsai, Alessandro Murgia, Serge Demeyer

The test suite is essential for fault detection during software development. First-order mutation coverage is an accurate metric to quantify the quality of the test suite. However, it is computationally expensive. Hence, the adoption of this metric is limited. In this study, we address this issue by proposing a realistic model able to estimate first-order mutation coverage using only higher-order mutation coverage. Our study shows how the estimation evolves along with the order of mutation. We validate the model with an empirical study based on 17 open-source projects.

SEMar 27, 2024

Cross-System Categorization of Abnormal Traces in Microservice-Based Systems via Meta-Learning

Yuqing Wang, Mika V. Mäntylä, Serge Demeyer et al.

Microservice-based systems (MSS) may fail with various fault types. While existing AIOps methods excel at detecting abnormal traces and locating the responsible service(s), human efforts are still required for diagnosing specific fault types and failure causes.This paper presents TraFaultDia, a novel AIOps framework to automatically classify abnormal traces into fault categories for MSS. We treat the classification process as a series of multi-class classification tasks, where each task represents an attempt to classify abnormal traces into specific fault categories for a MSS. TraFaultDia leverages meta-learning to train on several abnormal trace classification tasks with a few labeled instances from a MSS, enabling quick adaptation to new, unseen abnormal trace classification tasks with a few labeled instances across MSS. TraFaultDia's use cases are scalable depending on how fault categories are built from anomalies within MSS. We evaluated TraFaultDia on two MSS, TrainTicket and OnlineBoutique, with open datasets where each fault category is linked to faulty system components (service/pod) and a root cause. TraFaultDia automatically classifies abnormal traces into these fault categories, thus enabling the automatic identification of faulty system components and root causes without manual analysis. TraFaultDia achieves 93.26% and 85.20% accuracy on 50 new classification tasks for TrainTicket and OnlineBoutique, respectively, when trained within the same MSS with 10 labeled instances per category. In the cross-system context, when TraFaultDia is applied to a MSS different from the one it is trained on, TraFaultDia gets an average accuracy of 92.19% and 84.77% for the same set of 50 new, unseen abnormal trace classification tasks of the respective systems, also with 10 labeled instances provided for each fault category per task in each system.

SEAug 12, 2021

Small-Amp: Test Amplification in a Dynamically Typed Language

Mehrdad Abdi, Henrique Rocha, Serge Demeyer et al.

Some test amplification tools extend a manually created test suite with additional test cases to increase the code coverage. The technique is effective, in the sense that it suggests strong and understandable test cases, generally adopted by software engineers. Unfortunately, the current state-of-the-art for test amplification heavily relies on program analysis techniques which benefit a lot from explicit type declarations present in statically typed languages. In dynamically typed languages, such type declarations are not available and as a consequence test amplification has yet to find its way to programming languages like Smalltalk, Python, Ruby and Javascript. We propose to exploit profiling information --readily obtainable by executing the associated test suite-- to infer the necessary type information creating special test inputs with corresponding assertions. We evaluated this approach on 52 selected test classes from 13 mature projects in the Pharo ecosystem containing approximately 400 test methods. We show the improvement in killing new mutants and mutation coverage at least in 28 out of 52 test classes (53%). Moreover, these generated tests are understandable by humans: 8 out of 11 pull-requests submitted were merged into the main code base (72%). These results are comparable to the state-of-the-art, hence we conclude that test amplification is feasible for dynamically typed languages.

SEApr 25, 2021

Mutant Density: A Measure of Fault-Sensitive Complexity

Ali Parsai, Serge Demeyer

Software code complexity is a well-studied property to determine software component health. However, the existing code complexity metrics do not directly take into account the fault-proneness aspect of the code. We propose a metric called mutant density where we use mutation as a method to introduce artificial faults in code, and count the number of possible mutations per line. We show how this metric can be used to perform helpful analysis of real-life software projects.

SEApr 20, 2020

Software Test Automation Maturity -- A Survey of the State of the Practice

Yuqing Wang, Mika V. Mäntylä, Serge Demeyer et al.

The software industry has seen an increasing interest in test automation. In this paper, we present a test automation maturity survey serving as a self-assessment for practitioners. Based on responses of 151 practitioners coming from above 101 organizations in 25 countries, we make observations regarding the state of the practice of test automation maturity: a) The level of test automation maturity in different organizations is differentiated by the practices they adopt; b) Practitioner reported the quite diverse situation with respect to different practices, e.g., 85\% practitioners agreed that their test teams have enough test automation expertise and skills, while 47\% of practitioners admitted that there is lack of guidelines on designing and executing automated tests; c) Some practices are strongly correlated and/or closely clustered; d) The percentage of automated test cases and the use of Agile and/or DevOps development models are good indicators for a higher test automation maturity level; (e) The roles of practitioners may affect response variation, e.g., QA engineers give the most optimistic answers, consultants give the most pessimistic answers. Our results give an insight into present test automation processes and practices and indicate chances for further improvement in the present industry.

SEApr 8, 2020

C++11/14 Mutation Operators Based on Common Fault Patterns

Ali Parsai, Serge Demeyer, Seph De Busser

The C++11/14 standard offers a wealth of features aimed at helping programmers write better code. Unfortunately, some of these features may cause subtle programming faults, likely to go unnoticed during code reviews. In this paper we propose four new mutation operators for C++11/14 based on common fault patterns, which allow to verify whether a unit test suite is capable of testing against such faults. We validate the relevance of the proposed mutation operators by performing a case study on seven real-life software systems.

SESep 7, 2018

Dynamic Mutant Subsumption Analysis using LittleDarwin

Ali Parsai, Serge Demeyer

Many academic studies in the field of software testing rely on mutation testing to use as their comparison criteria. However, recent studies have shown that redundant mutants have a significant effect on the accuracy of their results. One solution to this problem is to use mutant subsumption to detect redundant mutants. Therefore, in order to facilitate research in this field, a mutation testing tool that is capable of detecting redundant mutants is needed. In this paper, we describe how we improved our tool, LittleDarwin, to fulfill this requirement.

SEJul 4, 2017

LittleDarwin: a Feature-Rich and Extensible Mutation Testing Framework for Large and Complex Java Systems

Ali Parsai, Alessandro Murgia, Serge Demeyer

Mutation testing is a well-studied method for increasing the quality of a test suite. We designed LittleDarwin as a mutation testing framework able to cope with large and complex Java software systems, while still being easily extensible with new experimental components. LittleDarwin addresses two existing problems in the domain of mutation testing: having a tool able to work within an industrial setting, and yet, be open to extension for cutting edge techniques provided by academia. LittleDarwin already offers higher-order mutation, null type mutants, mutant sampling, manual mutation, and mutant subsumption analysis. There is no tool today available with all these features that is able to work with typical industrial software systems.

SEJul 8, 2016

Evaluating Random Mutant Selection at Class-Level in Projects with Non-Adequate Test Suites

Ali Parsai, Alessandro Murgia, Serge Demeyer

Mutation testing is a standard technique to evaluate the quality of a test suite. Due to its computationally intensive nature, many approaches have been proposed to make this technique feasible in real case scenarios. Among these approaches, uniform random mutant selection has been demonstrated to be simple and promising. However, works on this area analyze mutant samples at project level mainly on projects with adequate test suites. In this paper, we fill this lack of empirical validation by analyzing random mutant selection at class level on projects with non-adequate test suites. First, we show that uniform random mutant selection underachieves the expected results. Then, we propose a new approach named weighted random mutant selection which generates more representative mutant samples. Finally, we show that representative mutant samples are larger for projects with high test adequacy.

SEJun 24, 2015

Mutation Testing as a Safety Net for Test Code Refactoring

Ali Parsai, Alessandro Murgia, Quinten David Soetens et al.

Refactoring is an activity that improves the internal structure of the code without altering its external behavior. When performed on the production code, the tests can be used to verify that the external behavior of the production code is preserved. However, when the refactoring is performed on test code, there is no safety net that assures that the external behavior of the test code is preserved. In this paper, we propose to adopt mutation testing as a means to verify if the behavior of the test code is preserved after refactoring. Moreover, we also show how this approach can be used to identify the part of the test code which is improperly refactored.

SEDec 11, 2014

Considering Polymorphism in Change-Based Test Suite Reduction

Ali Parsai, Quinten David Soetens, Alessandro Murgia et al.

With the increasing popularity of continuous integration, algorithms for selecting the minimal test-suite to cover a given set of changes are in order. This paper reports on how polymorphism can handle false negatives in a previous algorithm which uses method-level changes in the base-code to deduce which tests need to be rerun. We compare the approach with and without polymorphism on two distinct cases ---PMD and CruiseControl--- and discovered an interesting trade-off: incorporating polymorphism results in more relevant tests to be included in the test suite (hence improves accuracy), however comes at the cost of a larger test suite (hence increases the time to run the minimal test-suite).