SDASAug 2, 2018

AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies

arXiv:1808.00606v254 citations
Originality Synthesis-oriented
AI Analysis

This provides a shared dataset for researchers to compare speech activity detection methods, addressing a gap in openly available resources, though it is incremental as it focuses on data creation rather than new methods.

The authors tackled the lack of a publicly available, densely labeled dataset for speech activity detection by creating AVA-Speech, a dataset from YouTube videos with labels for clean speech, speech with music, and speech with noise, and reported benchmark performance using state-of-the-art models.

Speech activity detection (or endpointing) is an important processing step for applications such as speech recognition, language identification and speaker diarization. Both audio- and vision-based approaches have been used for this task in various settings, often tailored toward end applications. However, much of the prior work reports results in synthetic settings, on task-specific datasets, or on datasets that are not openly available. This makes it difficult to compare approaches and understand their strengths and weaknesses. In this paper, we describe a new dataset which we will release publicly containing densely labeled speech activity in YouTube videos, with the goal of creating a shared, available dataset for this task. The labels in the dataset annotate three different speech activity conditions: clean speech, speech co-occurring with music, and speech co-occurring with noise, which enable analysis of model performance in more challenging conditions based on the presence of overlapping noise. We report benchmark performance numbers on AVA-Speech using off-the-shelf, state-of-the-art audio and vision models that serve as a baseline to facilitate future research.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes