SDLGASMar 26, 2019

Musical Tempo and Key Estimation using Convolutional Neural Networks with Directional Filters

arXiv:1903.10839v130 citations
Originality Incremental advance
AI Analysis

This work addresses music information retrieval tasks for applications like audio analysis, but it is incremental as it builds on existing CNN methods with domain-specific adaptations.

The paper tackled musical tempo and key estimation by exploiting the semantics of spectrogram axes with CNNs using directional filters, showing that axis-aligned architectures perform similarly to VGG-style networks while being less vulnerable to confounding factors and requiring fewer parameters.

In this article we explore how the different semantics of spectrograms' time and frequency axes can be exploited for musical tempo and key estimation using Convolutional Neural Networks (CNN). By addressing both tasks with the same network architectures ranging from shallow, domain-specific approaches to deep variants with directional filters, we show that axis-aligned architectures perform similarly well as common VGG-style networks developed for computer vision, while being less vulnerable to confounding factors and requiring fewer model parameters.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes