SDOct 28, 2015

MUSAN: A Music, Speech, and Noise Corpus

arXiv:1510.08484v11661 citations
Originality Synthesis-oriented
AI Analysis

This provides a dataset for researchers and practitioners working on audio processing tasks like VAD and music/speech discrimination, but it is incremental as it offers a new collection of existing data types.

The authors introduced MUSAN, a new corpus containing music, speech, and noise, designed for training models in voice activity detection and music/speech discrimination, and demonstrated its application on broadcast news and speaker identification tasks.

This report introduces a new corpus of music, speech, and noise. This dataset is suitable for training models for voice activity detection (VAD) and music/speech discrimination. Our corpus is released under a flexible Creative Commons license. The dataset consists of music from several genres, speech from twelve languages, and a wide assortment of technical and non-technical noises. We demonstrate use of this corpus for music/speech discrimination on Broadcast news and VAD for speaker identification.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes