ASAISDJun 19, 2024

Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing

arXiv:2406.13385v1
Originality Synthesis-oriented
AI Analysis

This work addresses the need for explainable AI in domains like health or forensics where neural networks are often black boxes, offering an incremental improvement by applying NMF for interpretability.

The paper tackles the problem of audio segmentation by proposing an explainable-by-design model using non-negative matrix factorization (NMF) to provide interpretable representations, achieving good segmentation performances as demonstrated through deep analyses.

Audio segmentation is a key task for many speech technologies, most of which are based on neural networks, usually considered as black boxes, with high-level performances. However, in many domains, among which health or forensics, there is not only a need for good performance but also for explanations about the output decision. Explanations derived directly from latent representations need to satisfy "good" properties, such as informativeness, compactness, or modularity, to be interpretable. In this article, we propose an explainable-by-design audio segmentation model based on non-negative matrix factorization (NMF) which is a good candidate for the design of interpretable representations. This paper shows that our model reaches good segmentation performances, and presents deep analyses of the latent representation extracted from the non-negative matrix. The proposed approach opens new perspectives toward the evaluation of interpretable representations according to "good" properties.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes