SEAIJul 30, 2025

Metamorphic Testing of Deep Code Models: A Systematic Literature Review

arXiv:2507.22610v14 citationsh-index: 39
Originality Synthesis-oriented
AI Analysis

This review addresses the robustness problem for deep code models in software engineering, but it is incremental as it synthesizes existing research rather than introducing new methods.

The paper conducted a systematic literature review on metamorphic testing for deep code models, analyzing 45 papers to summarize the current landscape, including transformations, techniques, and evaluation methods used to assess robustness.

Large language models and deep learning models designed for code intelligence have revolutionized the software engineering field due to their ability to perform various code-related tasks. These models can process source code and software artifacts with high accuracy in tasks such as code completion, defect detection, and code summarization; therefore, they can potentially become an integral part of modern software engineering practices. Despite these capabilities, robustness remains a critical quality attribute for deep-code models as they may produce different results under varied and adversarial conditions (e.g., variable renaming). Metamorphic testing has become a widely used approach to evaluate models' robustness by applying semantic-preserving transformations to input programs and analyzing the stability of model outputs. While prior research has explored testing deep learning models, this systematic literature review focuses specifically on metamorphic testing for deep code models. By studying 45 primary papers, we analyze the transformations, techniques, and evaluation methods used to assess robustness. Our review summarizes the current landscape, identifying frequently evaluated models, programming tasks, datasets, target languages, and evaluation metrics, and highlights key challenges and future directions for advancing the field.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes