SDLGASJul 3, 2019

A Case Study of Deep-Learned Activations via Hand-Crafted Audio Features

arXiv:1907.01813v11 citations
Originality Synthesis-oriented
AI Analysis

This addresses the under-researched problem of CNN explainability for researchers in music information retrieval, but it is incremental as it applies existing similarity techniques to a specific domain.

The study tackled the explainability of CNNs in music and audio by comparing deep-learned activations with hand-crafted audio features for musical instrument recognition, finding similarities such as shallow layers aligning with harmonic and percussive components and deeper layers with chromagrams and other features.

The explainability of Convolutional Neural Networks (CNNs) is a particularly challenging task in all areas of application, and it is notably under-researched in music and audio domain. In this paper, we approach explainability by exploiting the knowledge we have on hand-crafted audio features. Our study focuses on a well-defined MIR task, the recognition of musical instruments from user-generated music recordings. We compute the similarity between a set of traditional audio features and representations learned by CNNs. We also propose a technique for measuring the similarity between activation maps and audio features which typically presented in the form of a matrix, such as chromagrams or spectrograms. We observe that some neurons' activations correspond to well-known classical audio features. In particular, for shallow layers, we found similarities between activations and harmonic and percussive components of the spectrum. For deeper layers, we compare chromagrams with high-level activation maps as well as loudness and onset rate with deep-learned embeddings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes