SEAIAug 22, 2025

From Benchmark Data To Applicable Program Repair: An Experience Report

arXiv:2508.16071v1h-index: 16
Originality Synthesis-oriented
AI Analysis

This work highlights the gap between academic benchmarks and practical industry needs in program repair, showing incremental improvements but limited real-world applicability.

The paper tackles automated program repair by combining existing techniques, finding that while benchmark performance improves, these methods fail on realistic industry defects. It shows that augmenting code with formal specifications helps LLMs generate better unit tests for complex production code, but specifications add little value for simple errors and real-world adoption remains limited due to issues like insufficient language expressiveness.

This paper describes our approach to automated program repair. We combine various techniques from the literature to achieve this. Our experiments show that our approach performs better than other techniques on standard benchmarks. However, on closer inspection, none of these techniques work on realistic defects that we see in industry. We find that augmenting code with formal specifications enables LLMs to generate higher-quality unit tests, especially for complex production code with improved coverage of edge cases and exception handling. However, specifications add little value for well-understood errors (e.g., null pointer, index out of bounds), but are beneficial for logic and string manipulation errors. Despite encouraging benchmark results, real-world adoption is limited since passing tests do not guarantee correct patches. Current challenges include insufficient expressiveness of the JML specification language, necessitating advanced verification tools and richer predicates. Our ongoing work is exploring contract automata, programming by example, and testcase repair, with a focus on integrating human feedback and measuring productivity gains - highlighting the gap between academic benchmarks and practical industry needs

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes