LGApr 29, 2025

Explanations Go Linear: Interpretable and Individual Latent Encoding for Post-hoc Explainability

arXiv:2504.20667v2h-index: 63ICDM
Originality Incremental advance
AI Analysis

This work addresses the problem of interpretability in machine learning for users needing reliable explanations, though it appears incremental as it builds on existing surrogate methods.

The paper tackled the limitations of surrogate-based post-hoc explainability methods by introducing ILLUME, a framework that combines global surrogates with instance-specific linear transformations to generate accurate, robust, and faithful local and global explanations for black-box classifiers.

Post-hoc explainability is essential for understanding black-box machine learning models. Surrogate-based techniques are widely used for local and global model-agnostic explanations but have significant limitations. Local surrogates capture non-linearities but are computationally expensive and sensitive to parameters, while global surrogates are more efficient but struggle with complex local behaviors. In this paper, we present ILLUME, a flexible and interpretable framework grounded in representation learning, that can be integrated with various surrogate models to provide explanations for any black-box classifier. Specifically, our approach combines a globally trained surrogate with instance-specific linear transformations learned with a meta-encoder to generate both local and global explanations. Through extensive empirical evaluations, we demonstrate the effectiveness of ILLUME in producing feature attributions and decision rules that are not only accurate but also robust and faithful to the black-box, thus providing a unified explanation framework that effectively addresses the limitations of traditional surrogate methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes