LGAISep 27, 2025

CoDA: Coding LM via Diffusion Adaptation

arXiv:2510.03270v14 citationsh-index: 27Has Code
Originality Incremental advance
AI Analysis

This work addresses the need for lightweight and efficient diffusion-based coding assistants for developers and researchers, though it is incremental in scaling down existing methods.

The paper tackles the problem of creating practical diffusion language models for coding by introducing CoDA, a 1.7B-parameter diffusion coder that matches or surpasses larger diffusion models up to 7B parameters on benchmarks like Humaneval, MBPP, and EvalPlus.

Diffusion language models promise bidirectional context and infilling capabilities that autoregressive coders lack, yet practical systems remain heavyweight. We introduce CoDA, a 1.7B-parameter diffusion coder trained on TPU with a fully open-source training pipeline. CoDA pairs large-scale diffusion pre-training with code-centric mid-training and instruction tuning, enabling confidence-guided sampling that keeps inference latency competitive. On Humaneval, MBPP, and EvalPlus, CoDA-1.7B-Instruct matches or surpasses diffusion models up to 7B parameters. Our release includes model checkpoints, evaluation harnesses, and TPU training pipelines to accelerate research on lightweight diffusion-based coding assistants.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes