SEOct 26, 2022
Piloting Copilot, Codex, and StarCoder2: Hot Temperature, Cold Prompts, or Black Magic?Jean-Baptiste Döderlein, Nguessan Hermann Kouadio, Mathieu Acher et al.
Language models are promising solutions for tackling increasing complex problems. In software engineering, they recently gained attention in code assistants, which generate programs from a natural language task description (prompt). They have the potential to save time and effort but remain poorly understood, limiting their optimal use. In this article, we investigate the impact of input variations on two configurations of a language model, focusing on parameters such as task description, surrounding context, model creativity, and the number of generated solutions. We design specific operators to modify these inputs and apply them to three LLM-based code assistants (Copilot, Codex, StarCoder2) and two benchmarks representing algorithmic problems (HumanEval, LeetCode). Our study examines whether these variations significantly affect program quality and how these effects generalize across models. Our results show that varying input parameters can greatly improve performance, achieving up to 79.27% success in one-shot generation compared to 22.44% for Codex and 31.1% for Copilot in default settings. Actioning this potential in practice is challenging due to the complex interplay in our study - the optimal settings for temperature, prompt, and number of generated solutions vary by problem. Reproducing our study with StarCoder2 confirms these findings, indicating they are not model-specific. We also uncover surprising behaviors (e.g., fully removing the prompt can be effective), revealing model brittleness and areas for improvement.
19.9SEApr 9
Modeling Sampling Workflows for Code RepositoriesRomain Lefeuvre, Maïwenn Le Goasteller, Jessie Galasso et al.
Empirical software engineering research often depends on datasets of code repository artifacts, where sampling strategies are employed to enable large-scale analyses. The design and evaluation of these strategies are critical, as they directly influence the generalizability of research findings. However, sampling remains an underestimated aspect in software engineering research: we identify two main challenges related to (1) the design and representativeness of sampling approaches, and (2) the ability to reason about the implications of sampling decisions on generalizability. To address these challenges, we propose a Domain-Specific Language (DSL) to explicitly describe complex sampling strategies through composable sampling operators. This formalism supports both the specification and the reasoning about the generalizability of results based on the applied sampling strategies. We implement the DSL as a Python-based fluent API, and demonstrate how it facilitates representativeness reasoning using statistical indicators extracted from sampling workflows. We validate our approach through a case study of MSR papers involving code repository sampling. Our results show that the DSL can model the sampling strategies reported in recent literature.
18.0SEMar 19
SpaceTime Programming: Live and Omniscient Exploration of Code and ExecutionJean-Baptiste Döderlein, Djamel Eddine Khelladi, Mathieu Acher et al.
Programming environments typically separate the world of static code from the dynamic execution of programs. Developers must switch between writing code and observing its execution, often with limited tools to understand the relationship between code changes and runtime behavior. Several paradigms and approaches exist to bridge this gap, including exploratory programming for comparing code variants, live programming for immediate feedback, and omniscient debugging for exploring execution history. However, existing solutions tend to focus on specific aspects and one specific paradigm rather than providing a fully integrated environment with multiple capabilities. This paper introduces \spacetime Programming, a novel approach that unifies these paradigms to create a programming model for exploring both code modifications and execution flow. At the core of our approach is a trace mechanism that captures not only execution state but also the corresponding code changes, enabling developers to explore programs in both space (code variants) and time (execution flow). As a proof of concept, we implemented a Python library supporting SpaceTime Programming and applied it in two contexts: a live omniscient debugger and a Pygame game development tool, showcased through a Flappy Bird-like game. We further evaluated SpaceTimePy on five real-world Python projects, finding performance overhead ranging from 35% to 150% on test suites.
PLJan 30, 2017
Industrial Experience Report on the Formal Specification of a Packet Filtering Language Using the K FrameworkGurvan Le Guernic, Benoit Combemale, José A. Galindo
Many project-specific languages, including in particular filtering languages, are defined using non-formal specifications written in natural languages. This leads to ambiguities and errors in the specification of those languages. This paper reports on an industrial experiment on using a tool-supported language specification framework (K) for the formal specification of the syntax and semantics of a filtering language having a complexity similar to those of real-life projects. This experimentation aims at estimating, in a specific industrial setting, the difficulty and benefits of formally specifying a packet filtering language using a tool-supported formal approach.
SEAug 25, 2014
Report on the First Workshop On the Globalization of Modeling LanguagesBenoit Combemale, Julien De Antoni, Robert B. France et al.
The first edition of GEMOC workshop was co-located with the MODELS 2013 conference in Miami, FL, USA. The workshop provided an open forum for sharing experiences, problems and solutions related to the challenges of using of multiple modeling languages in the development of complex software based systems. During the workshop, concrete language composition artifacts, approaches, and mechanisms were presented and discussed, ideas and opinions exchanged, and constructive feedback provided to authors of accepted papers. A major objective was to encourage collaborations and to start building a community that focused on providing solutions that support what we refer to as the globalization of domain-specific modeling languages, that is, support coordinated use of multiple languages throughout the development of complex systems. This report summarizes the presentations and discussions that took place in the first GEMOC 2013 workshop.
SEJun 4, 2013
Mashup of Meta-Languages and its Implementation in the Kermeta Language WorkbenchJean-Marc Jézéquel, Benoit Combemale, Olivier Barais et al.
With the growing use of domain-specific languages (DSL) in industry, DSL design and implementation goes far beyond an activity for a few experts only and becomes a challenging task for thousands of software engineers. DSL implementation indeed requires engineers to care for various concerns, from abstract syntax, static semantics, behavioral semantics, to extra-functional issues such as run-time performance. This paper presents an approach that uses one meta-language per language implementation concern. We show that the usage and combination of those meta-languages is simple and intuitive enough to deserve the term "mashup". We evaluate the approach by completely implementing the non trivial fUML modeling language, a semantically sound and executable subset of the Unified Modeling Language (UML).