LGFeb 2

LEMON: Local Explanations via Modality-aware OptimizatioN

arXiv:2602.02786v1
AI Analysis

This addresses the need for efficient and unified explanations in multimodal AI, though it is incremental as it builds on existing explainability methods.

The paper tackled the problem of explaining multimodal model predictions by introducing LEMON, a model-agnostic framework that provides local explanations, resulting in competitive faithfulness while reducing black-box evaluations by 35-67 times and runtime by 2-8 times compared to baselines.

Multimodal models are ubiquitous, yet existing explainability methods are often single-modal, architecture-dependent, or too computationally expensive to run at scale. We introduce LEMON (Local Explanations via Modality-aware OptimizatioN), a model-agnostic framework for local explanations of multimodal predictions. LEMON fits a single modality-aware surrogate with group-structured sparsity to produce unified explanations that disentangle modality-level contributions and feature-level attributions. The approach treats the predictor as a black box and is computationally efficient, requiring relatively few forward passes while remaining faithful under repeated perturbations. We evaluate LEMON on vision-language question answering and a clinical prediction task with image, text, and tabular inputs, comparing against representative multimodal baselines. Across backbones, LEMON achieves competitive deletion-based faithfulness while reducing black-box evaluations by 35-67 times and runtime by 2-8 times compared to strong multimodal baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes