8.4SEApr 29Code
Reproducible Automated Program Repair Is Hard -- Experiences With the Defects4J DatasetAdam Krafczyk, Klaus Schmid
In the research of automated program repair (APR), benchmark datasets consisting of known defects in combination with test suites that indicate the defects are of high importance. They allow for an evidence-based comparison of different APR approaches. In our own work on APR we found significant challenges when working with widely used defect datasets, which go beyond mere repeatability of defects via test cases. We summarize these identified challenges and related lessons learned to bring them to the attention of the APR community and quantify the potential impact of them. In particular, we investigate the widely used benchmark Defects4J, which has according to Google Scholar over 1,800 citations. It consists of 835 defects from 17 open-source Java projects; a hand-curated collection of defects, test suites that clearly indicate the defect, and human patches where any unrelated changes are removed. We find that, when executing the test suites with strict requirements for reproducibility in APR settings (beyond merely reproducing the defect via test cases), 180 (21.6 %) of the defects are not suitable for evaluation experiments. Further, we find that an additional 59 (7.1 %) defects have test suites that are obviously under-specified, as deleting a single statement from the code base makes all test cases pass, although the human-written patch does not only delete code. Our contributions are: a systematic collection of requirements for defect datasets for APR beyond traditional reproducibility of defects, a description of practical experiences and quantitative analysis of problems with the Defects4J dataset, as well as an implementation of an evaluation framework for APR tools for Java programs. This evaluation framework does stricter checking for indications of inadequate test suites, to avoid otherwise unnoticed problems in the test suite, such as flaky tests.
SEOct 19, 2021Code
MetricHaven -- More Than 23,000 Metrics for Measuring Quality Attributes of Software Product LinesSascha El-Sharkawy, Adam Krafczyk, Klaus Schmid
Variability-aware metrics are designed to measure qualitative aspects of software product lines. As we identified in a prior SLR \cite{El-SharkawyYamagishi-EichlerSchmid19}, there exist already many metrics that address code or variability separately, while the combination of both has been less researched. MetricHaven fills this gap, as it extensively supports combining information from code files and variability models. Further, we also enable the combination of well established single system metrics with novel variability-aware metrics, going beyond existing variability-aware metrics. Our tool supports most prominent single system and variability-aware code metrics. We provide configuration support for already implemented metrics, resulting in 23,342 metric variations. Further, we present an abstract syntax tree developed for MetricHaven, that allows the realization of additional code metrics. Tool: https://github.com/KernelHaven/MetricHaven Video: https://youtu.be/vPEmD5Sr6gM
SEOct 12, 2021
Fast Static Analyses of Software Product Lines -- An Example With More Than 42,000 MetricsSascha El-Sharkawy, Adam Krafczyk, Klaus Schmid
Context: Software metrics, as one form of static analyses, is a commonly used approach in software engineering in order to understand the state of a software system, in particular to identify potential areas prone to defects. Family-based techniques extract variability information from code artifacts in Software Product Lines (SPLs) to perform static analysis for all available variants. Many different types of metrics with numerous variants have been defined in literature. When counting all metrics including such variants, easily thousands of metrics can be defined. Computing all of them for large product lines can be an extremely expensive process in terms of performance and resource consumption. Objective: We address these performance and resource challenges while supporting customizable metric suites, which allow running both, single system and variability-aware code metrics. Method: In this paper, we introduce a partial parsing approach used for the efficient measurement of more than 42,000 code metric variations. The approach covers variability information and restricts parsing to the relevant parts of the Abstract Syntax Tree (AST). Conclusions: This partial parsing approach is designed to cover all relevant information to compute a broad variety of variability-aware code metrics on code artifacts containing annotation-based variability, e.g., realized with C-preprocessor statements. It allows for the flexible combination of single system and variability-aware metrics, which is not supported by existing tools. This is achieved by a novel representation of partially parsed product line code artifacts, which is tailored to the computation of the metrics. Our approach consumes considerably less resources, especially when computing many metric variants in parallel.
SEOct 12, 2021
Reverse Engineering Code Dependencies: Converting Integer-Based Variability to Propositional LogicAdam Krafczyk, Sascha El-Sharkawy, Klaus Schmid
A number of SAT-based analysis concepts and tools for software product lines exist, that extract code dependencies in propositional logic from the source code assets of the product line. On these extracted conditions, SAT-solvers are used to reason about the variability. However, in practice, a lot of software product lines use integer-based variability. The variability variables hold integer values, and integer operators are used in the conditions. Most existing analysis tools can not handle this kind of variability; they expect pure Boolean conditions. This paper introduces an approach to convert integer-based variability conditions to propositional logic. Running this approach as a preparation on an integer-based product line allows the existing SAT-based analyses to work without any modifications. The pure Boolean formulas, that our approach builds as a replacement for the integer-based conditions, are mostly equivalent to the original conditions with respect to satisfiability. Our approach was motivated by and implemented in the context of a real-world industrial case-study, where such a preparation was necessary to analyze the variability. Our contribution is an approach to convert conditions, that use integer variables, into propositional formulas, to enable easy usage of SAT-solvers on the result. It works well on restricted variables (i.e. variables with a small range of allowed values); unrestricted integer variables are handled less exact, but still retain useful variability information.
SEOct 12, 2021
Reverse Engineering Variability in an Industrial Product Line: Observations and Lessons LearnedSascha El-Sharkawy, Dhar Saura Jyoti, Adam Krafczyk et al.
Ideally, a variability model is a correct and complete representation of product line features and constraints among them. Together with a mapping between features and code, this ensures that only valid products can be configured and derived. However, in practice the modeled constraints might be neither complete nor correct, which causes problems in the configuration and product derivation phases. This paper presents an approach to reverse engineer variability constraints from the implementation, and thus improve the correctness and completeness of variability models. We extended the concept of feature effect analysis to extract variability constraints from code artifacts of the Bosch PS-EC large-scale product line. We present an industrial application of the approach and discuss its required modifications to handle non-Boolean variability and heterogeneous artifact types.
SEOct 12, 2021
An Empirical Study of Configuration Mismatches in LinuxSascha El-Sharkawy, Adam Krafczyk, Klaus Schmid
Ideally the variability of a product line is represented completely and correctly by its variability model. However, in practice additional variability is often represented on the level of the build system or in the code. Such a situation may lead to inconsistencies, where the actually realized variability does not fully correspond to the one described by the variability model. In this paper we focus on configuration mismatches, i.e., cases where the effective variability differs from the variability as it is represented by the variability model. While previous research has already shown that these situations still exist even today in well-analyzed product lines like Linux, so far it was unclear under what circumstances such issues occur in reality. In particular, it is open what types of configuration mismatches occur and how severe they are. Here, our contribution is to close this gap by presenting a detailed manual analysis of 80 configuration mismatches in the Linux 4.4.1 kernel and assess their criticality. We identify various categories of configuration issues and show that about two-thirds of the configuration mismatches may actually lead to kernel misconfigurations.