SDAIITLGASJul 11, 2025

Token-based Audio Inpainting via Discrete Diffusion

Meta AI
arXiv:2507.08333v31 citationsh-index: 22
Originality Highly original
AI Analysis

This work advances musical audio restoration by addressing a known bottleneck in diffusion-based methods for large gaps, with potential applications in audio editing and preservation.

The paper tackles the problem of audio inpainting for large missing segments in degraded recordings by applying discrete diffusion over tokenized music representations, achieving consistent outperformance over strong baselines for gaps of 150 ms and above on MusicNet and MAESTRO datasets.

Audio inpainting seeks to restore missing segments in degraded recordings. Previous diffusion-based methods exhibit impaired performance when the missing region is large. We introduce the first approach that applies discrete diffusion over tokenized music representations from a pre-trained audio tokenizer, enabling stable and semantically coherent restoration of long gaps. Our method further incorporates two training approaches: a derivative-based regularization loss that enforces smooth temporal dynamics, and a span-based absorbing transition that provides structured corruption during diffusion. Experiments on the MusicNet and MAESTRO datasets with gaps up to 750 ms show that our approach consistently outperforms strong baselines across range of gap lengths, for gaps of 150 ms and above. This work advances musical audio restoration and introduces new directions for discrete diffusion model training. Audio examples of our proposed method can be found at https://iftach21.github.io/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes