SEAug 15, 2023
From Commit Message Generation to History-Aware Commit Message CompletionAleksandra Eliseeva, Yaroslav Sokolov, Egor Bogomolov et al.
Commit messages are crucial to software development, allowing developers to track changes and collaborate effectively. Despite their utility, most commit messages lack important information since writing high-quality commit messages is tedious and time-consuming. The active research on commit message generation (CMG) has not yet led to wide adoption in practice. We argue that if we could shift the focus from commit message generation to commit message completion and use previous commit history as additional context, we could significantly improve the quality and the personal nature of the resulting commit messages. In this paper, we propose and evaluate both of these novel ideas. Since the existing datasets lack historical data, we collect and share a novel dataset called CommitChronicle, containing 10.7M commits across 20 programming languages. We use this dataset to evaluate the completion setting and the usefulness of the historical context for state-of-the-art CMG models and GPT-3.5-turbo. Our results show that in some contexts, commit message completion shows better results than generation, and that while in general GPT-3.5-turbo performs worse, it shows potential for long and detailed messages. As for the history, the results show that historical information improves the performance of CMG models in the generation task, and the performance of GPT-3.5-turbo in both generation and completion.
SEJan 1Code
Multi-Agent Coordinated Rename RefactoringAbhiram Bellur, Mohammed Raihan Ullah, Fraol Batole et al.
The primary value of AI agents in software development lies in their ability to extend the developer's capacity for reasoning and action, not to supplant human involvement. To showcase how to use agents working in tandem with developers, we designed a novel approach for carrying out coordinated renaming. Coordinated renaming, where a single rename refactoring triggers refactorings in multiple, related identifiers, is a frequent yet challenging task. Developers must manually propagate these rename refactorings across numerous files and contexts, a process that is both tedious and highly error-prone. State-of-the-art heuristic-based approaches produce an overwhelming number of false positives, while vanilla Large Language Models (LLMs) provide incomplete suggestions due to their limited context and inability to interact with refactoring tools. This leaves developers with incomplete refactorings or burdens them with filtering too many false positives. Coordinated renaming is exactly the kind of repetitive task that agents can significantly reduce the developers' burden while keeping them in the driver's seat. We designed, implemented, and evaluated the first multi-agent framework that automates coordinated renaming. It operates on a key insight: a developer's initial refactoring is a clue to infer the scope of related refactorings. Our Scope Inference Agent first transforms this clue into an explicit, natural-language Declared Scope. The Planned Execution Agent then uses this as a strict plan to identify program elements that should undergo refactoring and safely executes the changes by invoking the IDE's own trusted refactoring APIs. Finally, the Replication Agent uses it to guide the project-wide search. We first conducted a formative study on the practice of coordinated renaming in 609K commits in 100 open-source projects and surveyed 205 developers ...
SEMar 26, 2025Code
Leveraging LLMs, IDEs, and Semantic Embeddings for Automated Move Method RefactoringAbhiram Bellur, Fraol Batole, Mohammed Raihan Ullah et al.
MOVEMETHOD is a hallmark refactoring. Despite a plethora of research tools that recommend which methods to move and where, these recommendations do not align with how expert developers perform MOVEMETHOD. Given the extensive training of Large Language Models and their reliance upon naturalness of code, they should expertly recommend which methods are misplaced in a given class and which classes are better hosts. Our formative study of 2016 LLM recommendations revealed that LLMs give expert suggestions, yet they are unreliable: up to 80% of the suggestions are hallucinations. We introduce the first LLM fully powered assistant for MOVEMETHOD refactoring that automates its whole end-to-end lifecycle, from recommendation to execution. We designed novel solutions that automatically filter LLM hallucinations using static analysis from IDEs and a novel workflow that requires LLMs to be self-consistent, critique, and rank refactoring suggestions. As MOVEMETHOD refactoring requires global, projectlevel reasoning, we solved the limited context size of LLMs by employing refactoring-aware retrieval augment generation (RAG). Our approach, MM-assist, synergistically combines the strengths of the LLM, IDE, static analysis, and semantic relevance. In our thorough, multi-methodology empirical evaluation, we compare MM-assist with the previous state-of-the-art approaches. MM-assist significantly outperforms them: (i) on a benchmark widely used by other researchers, our Recall@1 and Recall@3 show a 1.7x improvement; (ii) on a corpus of 210 recent refactorings from Open-source software, our Recall rates improve by at least 2.4x. Lastly, we conducted a user study with 30 experienced participants who used MM-assist to refactor their own code for one week. They rated 82.8% of MM-assist recommendations positively. This shows that MM-assist is both effective and useful.
SEDec 20, 2021Code
Operation-based Refactoring-aware Merging: An Empirical EvaluationMax Ellis, Sarah Nadi, Danny Dig
Dealing with merge conflicts in version control systems is a challenging task for software developers. Resolving merge conflicts is a time-consuming and error-prone process, which distracts developers from important tasks. Recent work shows that refactorings are often involved in merge conflicts and that refactoring-related conflicts tend to be larger, making them harder to resolve. In the literature, there are two refactoring-aware merging techniques that claim to automatically resolve refactoring-related conflicts; however, these two techniques have never been empirically compared. In this paper, we present RefMerge, a rejuvenated Java-based design and implementation of the first technique, which is an operation-based refactoring-aware merging algorithm. We compare RefMerge to Git and the state-of-the-art graph-based refactoring-aware merging tool, IntelliMerge, on 2,001 merge scenarios with refactoring-related conflicts from 20 open-source projects. We find that RefMerge resolves or reduces conflicts in 497 (25%) merge scenarios while increasing conflicting LOC in only 214 (11%) scenarios. On the other hand, we find that IntelliMerge resolves or reduces conflicts in 478 (24%) merge scenarios but increases conflicting LOC in 597 (30%) merge scenarios. We additionally conduct a qualitative analysis of the differences between the three merging algorithms and provide insights of the strengths and weaknesses of each tool. We find that while IntelliMerge does well with ordering and formatting conflicts, it struggles with class-level refactorings and scenarios with several refactorings. On the other hand, RefMerge is resilient to the number of refactorings in a merge scenario, but we find that RefMerge introduces conflicts when inverting move-related refactorings.
SEDec 7, 2021Code
IntelliTC: Automating Type Changes in IntelliJ IDEAOleg Smirnov, Ameya Ketkar, Timofey Bryksin et al.
Developers often change types of program elements. Such refactoring often involves updating not only the type of the element itself, but also the API of all type-dependent references in the code, thus it is tedious and time-consuming. Despite type changes being more frequent than renamings, just a few current IDE tools provide partially-automated support only for a small set of hard-coded types. Researchers have recently proposed a data-driven approach to inferring API rewrite rules for type change patterns in Java using code commits history. In this paper, we build upon these recent advances and introduce IntelliTC - a tool to perform Java type change refactoring. We implemented it as a plugin for IntelliJ IDEA, a popular Java IDE developed by JetBrains. We present 3 different ways of providing support for such a refactoring from the standpoint of the user experience: Classic mode, Suggested Refactoring, and Inspection mode. To evaluate these modalities of using IntelliTC, we surveyed 22 experienced software developers. They positively rated the usefulness of the tool. The source code and distribution of the plugin are available on GitHub: https://github.com/JetBrains-Research/data-driven-type-migration. A demonstration video is on YouTube: https://youtu.be/pdcfvADA1PY.
SEOct 15, 2024
Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline SettingsPetr Tsvetkov, Aleksandra Eliseeva, Danny Dig et al.
When a Commit Message Generation (CMG) system is integrated into the IDEs and other products at JetBrains, we perform online evaluation based on user acceptance of the generated messages. However, performing online experiments with every change to a CMG system is troublesome, as each iteration affects users and requires time to collect enough statistics. On the other hand, offline evaluation, a prevalent approach in the research literature, facilitates fast experiments but employs automatic metrics that are not guaranteed to represent the preferences of real users. In this work, we describe a novel way we employed to deal with this problem at JetBrains, by leveraging an online metric - the number of edits users introduce before committing the generated messages to the VCS - to select metrics for offline experiments. To support this new type of evaluation, we develop a novel markup collection tool mimicking the real workflow with a CMG system, collect a dataset with 57 pairs consisting of commit messages generated by GPT-4 and their counterparts edited by human experts, and design and verify a way to synthetically extend such a dataset. Then, we use the final dataset of 656 pairs to study how the widely used similarity metrics correlate with the online metric reflecting the real users' experience. Our results indicate that edit distance exhibits the highest correlation with the online metric, whereas commonly used similarity metrics such as BLEU and METEOR demonstrate low correlation. This contradicts the previous studies on similarity metrics for CMG, suggesting that user interactions with a CMG system in real-world settings differ significantly from the responses by human labelers within controlled environments. We release all the code and the dataset to support future research in the field: https://jb.gg/cmg-evaluation.
SEJul 4, 2020
30 Years of Software Refactoring Research:A Systematic Literature ReviewChaima Abid, Vahid Alizadeh, Marouane Kessentini et al.
Due to the growing complexity of software systems, there has been a dramatic increase and industry demand for tools and techniques on software refactoring in the last ten years, defined traditionally as a set of program transformations intended to improve the system design while preserving the behavior. Refactoring studies are expanded beyond code-level restructuring to be applied at different levels (architecture, model, requirements, etc.), adopted in many domains beyond the object-oriented paradigm (cloud computing, mobile, web, etc.), used in industrial settings and considered objectives beyond improving the design to include other non-functional requirements (e.g., improve performance, security, etc.). Thus, challenges to be addressed by refactoring work are, nowadays, beyond code transformation to include, but not limited to, scheduling the opportune time to carry refactoring, recommendations of specific refactoring activities, detection of refactoring opportunities, and testing the correctness of applied refactorings. Therefore, the refactoring research efforts are fragmented over several research communities, various domains, and objectives. To structure the field and existing research results, this paper provides a systematic literature review and analyzes the results of 3183 research papers on refactoring covering the last three decades to offer the most scalable and comprehensive literature review of existing refactoring research studies. Based on this survey, we created a taxonomy to classify the existing research, identified research trends, and highlighted gaps in the literature and avenues for further research.