ASIRLGSDFeb 20, 2020

Multi-label Sound Event Retrieval Using a Deep Learning-based Siamese Structure with a Pairwise Presence Matrix

arXiv:2002.09026v14 citations
AI Analysis

This addresses a gap in sound event retrieval for realistic soundscapes with multiple co-occurring events, though it appears incremental as it builds on existing single-label methods.

The paper tackles the problem of sound event retrieval for multi-label audio recordings, where multiple sound events co-occur, by proposing deep learning architectures with a Siamese structure and a Pairwise Presence Matrix, achieving effective performance as demonstrated on the SONYC-UST dataset.

Realistic recordings of soundscapes often have multiple sound events co-occurring, such as car horns, engine and human voices. Sound event retrieval is a type of content-based search aiming at finding audio samples, similar to an audio query based on their acoustic or semantic content. State of the art sound event retrieval models have focused on single-label audio recordings, with only one sound event occurring, rather than on multi-label audio recordings (i.e., multiple sound events occur in one recording). To address this latter problem, we propose different Deep Learning architectures with a Siamese-structure and a Pairwise Presence Matrix. The networks are trained and evaluated using the SONYC-UST dataset containing both single- and multi-label soundscape recordings. The performance results show the effectiveness of our proposed model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes