SDLGASJun 21, 2021

Affinity Mixup for Weakly Supervised Sound Event Detection

arXiv:2106.11233v1
Originality Incremental advance
AI Analysis

This addresses the problem of detecting sound events with weak labels for applications like audio analysis, though it appears incremental as it builds on existing attention and graph neural network concepts.

The paper tackles weakly supervised sound event detection by introducing affinity mixup, a regularization technique that incorporates time-level similarities between frames using an adaptive affinity matrix. This approach improves event-F1 scores by 8.2% over state-of-the-art methods.

The weakly supervised sound event detection problem is the task of predicting the presence of sound events and their corresponding starting and ending points in a weakly labeled dataset. A weak dataset associates each training sample (a short recording) to one or more present sources. Networks that solely rely on convolutional and recurrent layers cannot directly relate multiple frames in a recording. Motivated by attention and graph neural networks, we introduce the concept of an affinity mixup to incorporate time-level similarities and make a connection between frames. This regularization technique mixes up features in different layers using an adaptive affinity matrix. Our proposed affinity mixup network improves over state-of-the-art techniques event-F1 scores by $8.2\%$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes