SDAILGMMASMay 7, 2025

Score Distillation Sampling for Audio: Source Separation, Synthesis, and Beyond

NVIDIA
arXiv:2505.04621v13 citationsh-index: 12
Originality Synthesis-oriented
AI Analysis

This provides a versatile framework for audio tasks using generative priors, though it is incremental as it extends an existing method to a new domain.

The authors introduced Audio-SDS, a generalization of Score Distillation Sampling to text-conditioned audio diffusion models, enabling tasks like source separation, synthesis, and calibration without specialized datasets.

We introduce Audio-SDS, a generalization of Score Distillation Sampling (SDS) to text-conditioned audio diffusion models. While SDS was initially designed for text-to-3D generation using image diffusion, its core idea of distilling a powerful generative prior into a separate parametric representation extends to the audio domain. Leveraging a single pretrained model, Audio-SDS enables a broad range of tasks without requiring specialized datasets. In particular, we demonstrate how Audio-SDS can guide physically informed impact sound simulations, calibrate FM-synthesis parameters, and perform prompt-specified source separation. Our findings illustrate the versatility of distillation-based methods across modalities and establish a robust foundation for future work using generative priors in audio tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes