19.8SEApr 9
Modeling Sampling Workflows for Code RepositoriesRomain Lefeuvre, Maïwenn Le Goasteller, Jessie Galasso et al.
Empirical software engineering research often depends on datasets of code repository artifacts, where sampling strategies are employed to enable large-scale analyses. The design and evaluation of these strategies are critical, as they directly influence the generalizability of research findings. However, sampling remains an underestimated aspect in software engineering research: we identify two main challenges related to (1) the design and representativeness of sampling approaches, and (2) the ability to reason about the implications of sampling decisions on generalizability. To address these challenges, we propose a Domain-Specific Language (DSL) to explicitly describe complex sampling strategies through composable sampling operators. This formalism supports both the specification and the reasoning about the generalizability of results based on the applied sampling strategies. We implement the DSL as a Python-based fluent API, and demonstrate how it facilitates representativeness reasoning using statistical indicators extracted from sampling workflows. We validate our approach through a case study of MSR papers involving code repository sampling. Our results show that the DSL can model the sampling strategies reported in recent literature.
AIAug 8, 2025
Formal Concept Analysis: a Structural Framework for Variability Extraction and AnalysisJessie Galasso
Formal Concept Analysis (FCA) is a mathematical framework for knowledge representation and discovery. It performs a hierarchical clustering over a set of objects described by attributes, resulting in conceptual structures in which objects are organized depending on the attributes they share. These conceptual structures naturally highlight commonalities and variabilities among similar objects by categorizing them into groups which are then arranged by similarity, making it particularly appropriate for variability extraction and analysis. Despite the potential of FCA, determining which of its properties can be leveraged for variability-related tasks (and how) is not always straightforward, partly due to the mathematical orientation of its foundational literature. This paper attempts to bridge part of this gap by gathering a selection of properties of the framework which are essential to variability analysis, and how they can be used to interpret diverse variability information within the resulting conceptual structures.
SEJan 19, 2022
Code Sophistication: From Code Recommendation to Logic RecommendationJessie Galasso, Michalis Famelis, Houari Sahraoui
A typical approach to programming is to first code the main execution scenario, and then focus on filling out alternative behaviors and corner cases. But, almost always, there exist unusual conditions that trigger atypical behaviors, which are hard to predict in program specifications, and are thus often not coded. In this paper, we consider the problem of detecting and recommending such missing behaviors, a task that we call code sophistication. Previous research on coding assistants usually focuses on recommending code fragments based on specifications of the intended behavior. In contrast, code sophistication happens in the absence of a specification, aiming to help developers complete the logic of their programs with missing and unspecified behaviors. We outline the research challenges to this problem and present early results showing how program logic can be completed by leveraging code structure and information about the usage of input parameters.
SEDec 14, 2020
Fixing Multiple Type Errors in Model Transformations with Alternative Oracles to Test CasesZahra VaraminyBahnemiry, Jessie Galasso, Houari Sahraoui
This paper addresses the issue of correcting type errors in model transformations in realistic scenarios where neither predefined patches nor behavior-safe guards such as test suites are available. Instead of using predefined patches targeting isolated errors of specific categories, we propose to explore the space of possible patches by combining basic edit operations for model transformation programs. To guide the search, we define two families of objectives: one to limit the number of type errors and the other to preserve the transformation behavior. To approximate the latter, we study two objectives: minimizing the number of changes and keeping the changes local. Additionally, we define four heuristics to refine candidate patches to increase the likelihood of correcting type errors while preserving the transformation behavior. We implemented our approach for the ATL language using the evolutionary algorithm NSGA-II, and performed an evaluation based on three published case studies. The evaluation results show that our approach was able to automatically correct on average more than82% of type errors for two cases and more than 56% for the third case.