SDCLJul 13, 2016

AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis

arXiv:1607.03766v34 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for nuanced audio analysis in multimedia research, though it is incremental as it extends existing methods to new label types.

The paper tackled the problem of analyzing audio content using adjective-noun and verb-noun pairs, which are under-explored compared to single-tag sound recognition, by creating the AudioPairBank dataset with over 33,000 audio files and 1,123 pairs, achieving 70% accuracy in sound recognition experiments.

Recently, sound recognition has been used to identify sounds, such as car and river. However, sounds have nuances that may be better described by adjective-noun pairs such as slow car, and verb-noun pairs such as flying insects, which are under explored. Therefore, in this work we investigate the relation between audio content and both adjective-noun pairs and verb-noun pairs. Due to the lack of datasets with these kinds of annotations, we collected and processed the AudioPairBank corpus consisting of a combined total of 1,123 pairs and over 33,000 audio files. One contribution is the previously unavailable documentation of the challenges and implications of collecting audio recordings with these type of labels. A second contribution is to show the degree of correlation between the audio content and the labels through sound recognition experiments, which yielded results of 70% accuracy, hence also providing a performance benchmark. The results and study in this paper encourage further exploration of the nuances in audio and are meant to complement similar research performed on images and text in multimedia analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes