SDMay 21, 2016

Deep convolutional networks on the pitch spiral for musical instrument recognition

Vincent Lostanlen, Carmine-Emanuele Cella

arXiv:1605.06644v318.479 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses instrument recognition for music analysis, but it is incremental as it benchmarks and combines existing convolutional strategies without introducing a fundamentally new approach.

The paper tackled the problem of audio-based musical instrument recognition by developing deep convolutional networks with different weight sharing strategies in the time-frequency domain, achieving the best classification accuracy through a hybrid architecture combining temporal, time-frequency, and pitch spiral kernels.

Musical performance combines a wide range of pitches, nuances, and expressive techniques. Audio-based classification of musical instruments thus requires to build signal representations that are invariant to such transformations. This article investigates the construction of learned convolutional architectures for instrument recognition, given a limited amount of annotated training data. In this context, we benchmark three different weight sharing strategies for deep convolutional networks in the time-frequency domain: temporal kernels; time-frequency kernels; and a linear combination of time-frequency kernels which are one octave apart, akin to a Shepard pitch spiral. We provide an acoustical interpretation of these strategies within the source-filter framework of quasi-harmonic sounds with a fixed spectral envelope, which are archetypal of musical notes. The best classification accuracy is obtained by hybridizing all three convolutional layers into a single deep learning architecture.

View on arXiv PDF Code

Similar