SEAICLAug 14, 2025

Diffusion is a code repair operator and generator

Microsoft
arXiv:2508.11110v1h-index: 65
Originality Incremental advance
AI Analysis

This work addresses the problem of efficient and scalable code repair for developers, though it is incremental as it builds on existing diffusion models.

The paper tackles the problem of last-mile code repair by showing that later steps in code diffusion models resemble repair operations, enabling two applications: directly repairing broken code by adding noise and resuming diffusion, and generating training data for repair tasks. Experiments across Python, Excel, and PowerShell domains demonstrate these applications and analyze their properties.

Code diffusion models generate code by iteratively removing noise from the latent representation of a code snippet. During later steps of the diffusion process, when the code snippet has almost converged, differences between discrete representations of these snippets look like last-mile repairs applied to broken or incomplete code. We evaluate the extent to which this resemblance can be exploited to leverage pre-trained code diffusion models for the problem of last-mile repair by considering two applications with significant potential. First, we can leverage the diffusion model for last-mile repair by adding noise to a broken code snippet and resuming the diffusion process. Second, we can leverage the diffusion model to generate arbitrary amount of training data for last-mile repair tasks (that are computationally more efficient) by sampling an intermediate program (input) and the final program (output) from the diffusion process. We perform experiments on 3 domains (Python, Excel and PowerShell) to evaluate applications, as well as analyze properties.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes