50.8LGMay 25
Merge-Bench: Resolve Merge Conflicts with Large Language ModelsBenedikt Schesch, Michael D. Ernst
This paper applies machine learning to the difficult and important task of version control merging. (1) We constructed a dataset, Merge-Bench, of 7938 real-world merge conflict hunks from 1439 GitHub repositories. The ground truth is the merge resolution that developers committed to the repository. Our dataset construction methodology is scalable to arbitrary amounts of data since no manual labeling is required. (2) We trained a model, LLMergeJ, to resolve merge conflicts in Java programs. Our approach uses Group Relative Policy Optimization (GRPO), an online reinforcement learning method, to train a Large Language Model (LLM). (3) We performed two evaluations of the performance of LLMs on resolving merge conflicts. On Java programs, LLMergeJ with 14B parameters outperforms 3 commercial LLMs, trailing only Gemini 2.5 Pro. Across 11 programming languages, commercial LLM performance is largely stable from language to language. The best models correctly resolve less than 60% of merge conflicts.
DCJun 13, 2018
A Graphical Interactive Debugger for Distributed SystemsDoug Woos, Zachary Tatlock, Michael D. Ernst et al.
Designing and debugging distributed systems is notoriously difficult. The correctness of a distributed system is largely determined by its handling of failure scenarios. The sequence of events leading to a bug can be long and complex, and it is likely to include message reorderings and failures. On single-node systems, interactive debuggers enable stepping through an execution of the program, but they lack the ability to easily simulate failure scenarios and control the order in which messages are delivered. Oddity is a graphical, interactive debugger for distributed systems. It brings the power of traditional step-through debugging---fine-grained control and observation of a program as it executes---to distributed systems. It also enables exploratory testing, in which an engineer examines and perturbs the behavior of a system in order to better understand it, perhaps without a specific bug in mind. A programmer can directly control message and failure interleaving. Oddity supports time travel, allowing a developer to explore multiple branching executions of a system within a single debugging session. Above all, Oddity encourages distributed systems thinking: rather than assuming the normal case and attaching failure handling as an afterthought, distributed systems should be developed around the certainty of message loss and node failure. Graduate and undergraduate students used Oddity in two distributed systems classes. Usage tracking and qualitative surveys showed that students found Oddity useful for both debugging and exploratory testing.
SEMar 27, 2018
An Empirical Study of Fault Localization Families and Their CombinationsDaming Zou, Jingjing Liang, Yingfei Xiong et al.
The performance of fault localization techniques is critical to their adoption in practice. This paper reports on an empirical study of a wide range of fault localization techniques on real-world faults. Different from previous studies, this paper (1) considers a wide range of techniques from different families, (2) combines different techniques, and (3) considers the execution time of different techniques. Our results reveal that a combined technique significantly outperforms any individual technique (200% increase in faults localized in Top 1), suggesting that combination may be a desirable way to apply fault localization techniques and that future techniques should also be evaluated in the combined setting. Our implementation is publicly available for evaluating and combining fault localization techniques.
CLFeb 25, 2018
NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating SystemXi Victoria Lin, Chenglong Wang, Luke Zettlemoyer et al.
We present new data and semantic parsing methods for the problem of mapping English sentences to Bash commands (NL2Bash). Our long-term goal is to enable any user to perform operations such as file manipulation, search, and application-specific scripting by simply stating their goals in English. We take a first step in this domain, by providing a new dataset of challenging but commonly used Bash commands and expert-written English descriptions, along with baseline methods to establish performance levels on this task.
SEMar 12, 2013
Using State Infection Conditions to Detect Equivalent Mutants and Speed up Mutation AnalysisRené Just, Michael D. Ernst, Gordon Fraser
Mutation analysis evaluates test suites and testing techniques by measuring how well they detect seeded defects (mutants). Even though well established in research, mutation analysis is rarely used in practice due to scalability problems --- there are multiple mutations per code statement leading to a large number of mutants, and hence executions of the test suite. In addition, the use of mutation to improve test suites is futile for mutants that are equivalent, which means that there exists no test case that distinguishes them from the original program. This paper introduces two optimizations based on state infection conditions, i.e., conditions that determine for a test execution whether the same execution on a mutant would lead to a different state. First, redundant test execution can be avoided by monitoring state infection conditions, leading to an overall performance improvement. Second, state infection conditions can aid in identifying equivalent mutants, thus guiding efforts to improve test suites.