SEMay 19

MuMuTestUp: Mutation-based Multi-Agent Test Case Update

arXiv:2605.1926570.6Has Code
AI Analysis

For developers maintaining test suites in rapidly evolving software projects, MuMuTestUp improves test update effectiveness by focusing on fault detection and coverage, addressing a practical bottleneck in CI/CD.

MuMuTestUp is a mutation-guided multi-agent framework for automatic test case updates in CI/CD pipelines, addressing limitations of existing LLM-based approaches by strengthening assertions, targeting uncovered lines/branches, and using semantic retrieval. On a new 571-sample dataset PRBENCH from 10 Java projects, it outperforms state-of-the-art baselines using both open-source and closed-source LLMs.

Modern software systems evolve rapidly under CI/CD practices, where tests are critical for quality. However, substantial code changes often render existing test cases obsolete, causing pipeline disruptions, reduced productivity, and compromised quality. Recent automatic test update approaches leverage LLMs to refine test cases via execution feedback and exact-matching context retrieval, prioritizing executability and line coverage but suffering three limitations: (1) neglecting test assertion adequacy, weakening fault detection; (2) relying on coarse line coverage instead of specific uncovered lines/branches; (3) using exact-matching retrieval, which fails for LLM hallucinated queries. To address these, we propose MuMuTestUp, a mutation-guided multi-agent framework with three specialized agents: Mutation Analysis (strengthens assertions via surviving mutants), Coverage Analysis (generates targeted repair instructions for uncovered lines/branches), and Semantic Retrieval (handles hallucinations via semantic-similarity search). We also construct PRBENCH, a 571-sample pull-request-level dataset from 10 open-source Java projects (validated for cross-commit update scenarios). Evaluations against state-of-the-art baselines use both open-source (Deepseek-V3.2) and closed-source (GPT-4.1) LLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes