SDASNov 11, 2019

Visualizing and Understanding Self-attention based Music Tagging

arXiv:1911.04385v13 citations
Originality Synthesis-oriented
AI Analysis

This work addresses interpretability for researchers and practitioners in music information retrieval, but it is incremental as it builds on a previously proposed self-attention model.

The paper tackles the problem of interpreting self-attention mechanisms in music tagging models, focusing on visualizing how these models process music as temporal sequences rather than images, with results indicating improved interpretability.

Recently, we proposed a self-attention based music tagging model. Different from most of the conventional deep architectures in music information retrieval, which use stacked 3x3 filters by treating music spectrograms as images, the proposed self-attention based model attempted to regard music as a temporal sequence of individual audio events. Not only the performance, but it could also facilitate better interpretability. In this paper, we mainly focus on visualizing and understanding the proposed self-attention based music tagging model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes