Nikolaos Tsantalis

SE
h-index30
6papers
358citations
Novelty37%
AI Score42

6 Papers

SEMar 26, 2025Code
Leveraging LLMs, IDEs, and Semantic Embeddings for Automated Move Method Refactoring

Abhiram Bellur, Fraol Batole, Mohammed Raihan Ullah et al.

MOVEMETHOD is a hallmark refactoring. Despite a plethora of research tools that recommend which methods to move and where, these recommendations do not align with how expert developers perform MOVEMETHOD. Given the extensive training of Large Language Models and their reliance upon naturalness of code, they should expertly recommend which methods are misplaced in a given class and which classes are better hosts. Our formative study of 2016 LLM recommendations revealed that LLMs give expert suggestions, yet they are unreliable: up to 80% of the suggestions are hallucinations. We introduce the first LLM fully powered assistant for MOVEMETHOD refactoring that automates its whole end-to-end lifecycle, from recommendation to execution. We designed novel solutions that automatically filter LLM hallucinations using static analysis from IDEs and a novel workflow that requires LLMs to be self-consistent, critique, and rank refactoring suggestions. As MOVEMETHOD refactoring requires global, projectlevel reasoning, we solved the limited context size of LLMs by employing refactoring-aware retrieval augment generation (RAG). Our approach, MM-assist, synergistically combines the strengths of the LLM, IDE, static analysis, and semantic relevance. In our thorough, multi-methodology empirical evaluation, we compare MM-assist with the previous state-of-the-art approaches. MM-assist significantly outperforms them: (i) on a benchmark widely used by other researchers, our Recall@1 and Recall@3 show a 1.7x improvement; (ii) on a corpus of 210 recent refactorings from Open-source software, our Recall rates improve by at least 2.4x. Lastly, we conducted a user study with 30 experienced participants who used MM-assist to refactor their own code for one week. They rated 82.8% of MM-assist recommendations positively. This shows that MM-assist is both effective and useful.

SEDec 7, 2021Code
IntelliTC: Automating Type Changes in IntelliJ IDEA

Oleg Smirnov, Ameya Ketkar, Timofey Bryksin et al.

Developers often change types of program elements. Such refactoring often involves updating not only the type of the element itself, but also the API of all type-dependent references in the code, thus it is tedious and time-consuming. Despite type changes being more frequent than renamings, just a few current IDE tools provide partially-automated support only for a small set of hard-coded types. Researchers have recently proposed a data-driven approach to inferring API rewrite rules for type change patterns in Java using code commits history. In this paper, we build upon these recent advances and introduce IntelliTC - a tool to perform Java type change refactoring. We implemented it as a plugin for IntelliJ IDEA, a popular Java IDE developed by JetBrains. We present 3 different ways of providing support for such a refactoring from the standpoint of the user experience: Classic mode, Suggested Refactoring, and Inspection mode. To evaluate these modalities of using IntelliTC, we surveyed 22 experienced software developers. They positively rated the usefulness of the tool. The source code and distribution of the plugin are available on GitHub: https://github.com/JetBrains-Research/data-driven-type-migration. A demonstration video is on YouTube: https://youtu.be/pdcfvADA1PY.

73.0SEMay 4
AI-Generated Smells: An Analysis of Code and Architecture in LLM and Agent-Driven Development

Yuecai Zhu, Nikolaos Tsantalis, Peter C. Rigby

The promise of Large Language Models in automated software engineering is often measured by functional correctness, overlooking the critical issue of long term maintainability. This paper presents a systematic audit of technical debt in AI-generated software, revealing that AI does not eliminate flaws but rather introduces a distinct machine signature of defects. Our multi-scale analysis, spanning single-file algorithmic tasks and complex, agent generated systems, identifies a fundamental Reasoning-Complexity Trade-off: as models become more capable, they generate increasingly bloated and coupled code. This architectural decay is so pronounced that we establish a Volume-Quality Inverse Law, where code volume is a near perfect predictor of structural degradation. Crucially, we demonstrate that neither functional correctness nor detailed prompting mitigates this decay. These findings challenge the current paradigm of prompt-driven generation, reframing the central problem of AI-based software engineering from one of code generation to one of architectural complexity management. We conclude that future progress depends on equipping agents with explicit architectural foresight to ensure the software they build is not just functional, but also maintainable.

SEAug 24, 2021
An Empirical Study on Refactoring-Inducing Pull Requests

Flávia Coelho, Nikolaos Tsantalis, Tiago Massoni et al.

Background: Pull-based development has shaped the practice of Modern Code Review (MCR), in which reviewers can contribute code improvements, such as refactorings, through comments and commits in Pull Requests (PRs). Past MCR studies uniformly treat all PRs, regardless of whether they induce refactoring or not. We define a PR as refactoring-inducing, when refactoring edits are performed after the initial commit(s), as either a result of discussion among reviewers or spontaneous actions carried out by the PR developer. Aims: This mixed study (quantitative and qualitative) explores code reviewing-related aspects intending to characterize refactoring-inducing PRs. Method: We hypothesize that refactoring-inducing PRs have distinct characteristics than non-refactoring-inducing ones and thus deserve special attention and treatment from researchers, practitioners, and tool builders. To investigate our hypothesis, we mined a sample of 1,845 Apache's merged PRs from GitHub, mined refactoring edits in these PRs, and ran a comparative study between refactoring-inducing and non-refactoring-inducing PRs. We also manually examined 2,096 review comments and 1,891 detected refactorings from 228 refactoring-inducing PRs. Results: We found 30.2% of refactoring-inducing PRs in our sample and that they significantly differ from non-refactoring-inducing ones in terms of number of commits, code churn, number of file changes, number of review comments, length of discussion, and time to merge. However, we found no statistical evidence that the number of reviewers is related to refactoring-inducement. Our qualitative analysis revealed that at least one refactoring edit was induced by review in 133 (58.3%) of the refactoring-inducing PRs examined. Conclusions: Our findings suggest directions for researchers, practitioners, and tool builders to improve practices around pull-based code review.

SEOct 27, 2020
Dependency Smells in JavaScript Projects

Abbas Javan Jafari, Diego Elias Costa, Rabe Abdalkareem et al.

Dependency management in modern software development poses many challenges for developers who wish to stay up to date with the latest features and fixes whilst ensuring backwards compatibility. Project maintainers have opted for varied, and sometimes conflicting, approaches for maintaining their dependencies. Opting for unsuitable approaches can introduce bugs and vulnerabilities into the project, introduce breaking changes, cause extraneous installations, and reduce dependency understandability, making it harder for others to contribute effectively. In this paper, we empirically examine evidence of recurring dependency management issues (dependency smells). We look at the commit data for a dataset of 1,146 active JavaScript repositories to catalog, quantify and understand dependency smells. Through a series of surveys with practitioners, we identify and quantify seven dependency smells with varying degrees of popularity and investigate why they are introduced throughout project history. Our findings indicate that dependency smells are prevalent in JavaScript projects with two or more distinct smells appearing in 80% of the projects, but they generally infect a minority of a project's dependencies. Our observations show that the number of dependency smells tend to increase over time. Practitioners agree that dependency smells bring about many problems including security threats, bugs, dependency breakage, runtime errors, and other maintenance issues. These smells are generally introduced as developers react to dependency misbehaviour and the shortcomings of the npm ecosystem.

SEJul 8, 2016
Why We Refactor? Confessions of GitHub Contributors

Danilo Silva, Nikolaos Tsantalis, Marco Tulio Valente

Refactoring is a widespread practice that helps developers to improve the maintainability and readability of their code. However, there is a limited number of studies empirically investigating the actual motivations behind specific refactoring operations applied by developers. To fill this gap, we monitored Java projects hosted on GitHub to detect recently applied refactorings, and asked the developers to ex- plain the reasons behind their decision to refactor the code. By applying thematic analysis on the collected responses, we compiled a catalogue of 44 distinct motivations for 12 well-known refactoring types. We found that refactoring activity is mainly driven by changes in the requirements and much less by code smells. Extract Method is the most versatile refactoring operation serving 11 different purposes. Finally, we found evidence that the IDE used by the developers affects the adoption of automated refactoring tools.