SD CV ASMar 5, 2021

Slow-Fast Auditory Streams For Audio Recognition

Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen

arXiv:2103.03516v127.677 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses audio recognition for applications like sound classification and action recognition, but it is incremental as it adapts an existing visual method to audio.

The paper tackles audio recognition by proposing a two-stream convolutional network inspired by visual recognition, achieving state-of-the-art results on VGG-Sound and EPIC-KITCHENS-100 datasets.

We propose a two-stream convolutional network for audio recognition, that operates on time-frequency spectrogram inputs. Following similar success in visual recognition, we learn Slow-Fast auditory streams with separable convolutions and multi-level lateral connections. The Slow pathway has high channel capacity while the Fast pathway operates at a fine-grained temporal resolution. We showcase the importance of our two-stream proposal on two diverse datasets: VGG-Sound and EPIC-KITCHENS-100, and achieve state-of-the-art results on both.

View on arXiv PDF Code

Similar