SDLGASMay 11, 2023

Tackling Interpretability in Audio Classification Networks with Non-negative Matrix Factorization

arXiv:2305.07132v113 citations
Originality Incremental advance
AI Analysis

It addresses interpretability for audio processing networks, enabling intuitive, listenable explanations for end-users, though it is incremental as it builds on existing NMF methods.

The paper tackles interpretability in audio classification networks by proposing a novel interpreter design using non-negative matrix factorization (NMF) to generate listenable, audio-based interpretations for both post-hoc and by-design settings, demonstrating applicability on multi-label real-world audio and music tasks.

This paper tackles two major problem settings for interpretability of audio processing networks, post-hoc and by-design interpretation. For post-hoc interpretation, we aim to interpret decisions of a network in terms of high-level audio objects that are also listenable for the end-user. This is extended to present an inherently interpretable model with high performance. To this end, we propose a novel interpreter design that incorporates non-negative matrix factorization (NMF). In particular, an interpreter is trained to generate a regularized intermediate embedding from hidden layers of a target network, learnt as time-activations of a pre-learnt NMF dictionary. Our methodology allows us to generate intuitive audio-based interpretations that explicitly enhance parts of the input signal most relevant for a network's decision. We demonstrate our method's applicability on a variety of classification tasks, including multi-label data for real-world audio and music.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes