LG AIJan 30

ExplainerPFN: Towards tabular foundation models for model-free zero-shot feature importance estimations

arXiv:2601.23068v11 citationsh-index: 4Has Code

Originality Incremental advance

AI Analysis

This addresses the need for model-free interpretability in tabular data analysis, offering a zero-shot solution that could enhance transparency in domains like healthcare or finance, though it is incremental as it builds on existing foundation model approaches.

The paper tackles the problem of estimating Shapley values for feature importance in supervised classification without access to the underlying model, which is often unavailable or computationally expensive in real-world deployments, by introducing ExplainerPFN, a tabular foundation model that achieves performance competitive with few-shot surrogate explainers using 2-10 SHAP examples.

Computing the importance of features in supervised classification tasks is critical for model interpretability. Shapley values are a widely used approach for explaining model predictions, but require direct access to the underlying model, an assumption frequently violated in real-world deployments. Further, even when model access is possible, their exact computation may be prohibitively expensive. We investigate whether meaningful Shapley value estimations can be obtained in a zero-shot setting, using only the input data distribution and no evaluations of the target model. To this end, we introduce ExplainerPFN, a tabular foundation model built on TabPFN that is pretrained on synthetic datasets generated from random structural causal models and supervised using exact or near-exact Shapley values. Once trained, ExplainerPFN predicts feature attributions for unseen tabular datasets without model access, gradients, or example explanations. Our contributions are fourfold: (1) we show that few-shot learning-based explanations can achieve high fidelity to SHAP values with as few as two reference observations; (2) we propose ExplainerPFN, the first zero-shot method for estimating Shapley values without access to the underlying model or reference explanations; (3) we provide an open-source implementation of ExplainerPFN, including the full training pipeline and synthetic data generator; and (4) through extensive experiments on real and synthetic datasets, we show that ExplainerPFN achieves performance competitive with few-shot surrogate explainers that rely on 2-10 SHAP examples.

View on arXiv PDF

Similar