SD AI IT LG ASJul 11, 2025

Token-based Audio Inpainting via Discrete Diffusion

Tali Dror, Iftach Shoham, Moshe Buchris, Oren Gal, Haim Permuter, Gilad Katz, Eliya Nachmani

Meta AI

arXiv:2507.08333v34.01 citationsh-index: 22

Originality Highly original

AI Analysis

This work advances musical audio restoration by addressing a known bottleneck in diffusion-based methods for large gaps, with potential applications in audio editing and preservation.

The paper tackles the problem of audio inpainting for large missing segments in degraded recordings by applying discrete diffusion over tokenized music representations, achieving consistent outperformance over strong baselines for gaps of 150 ms and above on MusicNet and MAESTRO datasets.

Audio inpainting seeks to restore missing segments in degraded recordings. Previous diffusion-based methods exhibit impaired performance when the missing region is large. We introduce the first approach that applies discrete diffusion over tokenized music representations from a pre-trained audio tokenizer, enabling stable and semantically coherent restoration of long gaps. Our method further incorporates two training approaches: a derivative-based regularization loss that enforces smooth temporal dynamics, and a span-based absorbing transition that provides structured corruption during diffusion. Experiments on the MusicNet and MAESTRO datasets with gaps up to 750 ms show that our approach consistently outperforms strong baselines across range of gap lengths, for gaps of 150 ms and above. This work advances musical audio restoration and introduces new directions for discrete diffusion model training. Audio examples of our proposed method can be found at https://iftach21.github.io/.

View on arXiv PDF

Similar