MusicLIME: Explainable Multimodal Music Understanding
This addresses the need for fairness, bias reduction, and trust in music understanding systems, though it is incremental as it adapts existing explanation techniques to a multimodal context.
The paper tackles the problem of explainability in multimodal music models by introducing MusicLIME, a model-agnostic feature importance method that reveals interactions between audio and lyrical features, providing holistic explanations and aggregating them into global insights.
Multimodal models are critical for music understanding tasks, as they capture the complex interplay between audio and lyrics. However, as these models become more prevalent, the need for explainability grows-understanding how these systems make decisions is vital for ensuring fairness, reducing bias, and fostering trust. In this paper, we introduce MusicLIME, a model-agnostic feature importance explanation method designed for multimodal music models. Unlike traditional unimodal methods, which analyze each modality separately without considering the interaction between them, often leading to incomplete or misleading explanations, MusicLIME reveals how audio and lyrical features interact and contribute to predictions, providing a holistic view of the model's decision-making. Additionally, we enhance local explanations by aggregating them into global explanations, giving users a broader perspective of model behavior. Through this work, we contribute to improving the interpretability of multimodal music models, empowering users to make informed choices, and fostering more equitable, fair, and transparent music understanding systems.