SDCVASMar 5, 2021

Slow-Fast Auditory Streams For Audio Recognition

arXiv:2103.03516v177 citations
Originality Incremental advance
AI Analysis

This work addresses audio recognition for applications like sound classification and action recognition, but it is incremental as it adapts an existing visual method to audio.

The paper tackles audio recognition by proposing a two-stream convolutional network inspired by visual recognition, achieving state-of-the-art results on VGG-Sound and EPIC-KITCHENS-100 datasets.

We propose a two-stream convolutional network for audio recognition, that operates on time-frequency spectrogram inputs. Following similar success in visual recognition, we learn Slow-Fast auditory streams with separable convolutions and multi-level lateral connections. The Slow pathway has high channel capacity while the Fast pathway operates at a fine-grained temporal resolution. We showcase the importance of our two-stream proposal on two diverse datasets: VGG-Sound and EPIC-KITCHENS-100, and achieve state-of-the-art results on both.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes