AS SDMay 13

FSD50K-Solo: Automated Curation of Single-Source Sound Events

Ningyuan Yang, Sile Yin, Li-Chia Yang, Bryce Irvin, Xiao Quan, Marko Stamenovic, Shuo Zhang

arXiv:2605.1393124.9h-index: 5Has Code

Predicted impact top 11% in AS · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the need for clean, single-source audio datasets for training neural networks, but the approach is incremental as it builds on existing models and datasets.

The authors developed a data curation framework that uses a generative diffusion model and a pre-trained audio encoder to automatically filter multi-source samples from FSD50K, producing a single-source subset called FSD50K-Solo. The method achieved strong performance on a human-curated test set.

High-quality training datasets are essential for the performance of neural networks. However, the audio domain still lacks a large-scale, strongly-labeled, and single-source sound event dataset. The FSD50K dataset, despite being relatively large and open, contains a considerable fraction of multi-source samples where background interference or overlapping events could limit the usefulness of the data. To address this challenge, we introduce a data curation framework designed for large-scale open audio corpora. Our approach leverages a generative diffusion model to synthesize clean single-class events to construct controlled noisy mixtures for supervision. We subsequently employ a pre-trained audio encoder coupled with a discriminative classifier to automatically identify and filter out multi-source samples. Experiments show that our framework achieves strong performance on a human expert-curated test set. Finally, we release FSD50K-Solo, a model-curated subset of FSD50K containing single-source audio samples identified by our method. Beyond FSD50K, our method establishes a scalable paradigm for curating open source audio corpora.

View on arXiv PDF

Similar