CLJan 10, 2025

ConSim: Measuring Concept-Based Explanations' Effectiveness with Automated Simulatability

Antonin Poché, Alon Jacovi, Agustin Martin Picard, Victor Boutin, Fanny Jourdan

arXiv:2501.05855v413.99 citationsh-index: 18Has CodeACL

Originality Incremental advance

AI Analysis

This work addresses the challenge of evaluating concept-based explanations for machine learning models, which is crucial for improving interpretability in AI applications, though it is incremental by automating existing simulatability methods.

The paper tackles the problem of evaluating concept-based explanations by introducing an automated simulatability framework that uses large language models as simulators to predict model outputs, enabling scalable and consistent assessment across models and datasets, with results showing that LLMs provide reliable rankings of explanation methods.

Concept-based explanations work by mapping complex model computations to human-understandable concepts. Evaluating such explanations is very difficult, as it includes not only the quality of the induced space of possible concepts but also how effectively the chosen concepts are communicated to users. Existing evaluation metrics often focus solely on the former, neglecting the latter. We introduce an evaluation framework for measuring concept explanations via automated simulatability: a simulator's ability to predict the explained model's outputs based on the provided explanations. This approach accounts for both the concept space and its interpretation in an end-to-end evaluation. Human studies for simulatability are notoriously difficult to enact, particularly at the scale of a wide, comprehensive empirical evaluation (which is the subject of this work). We propose using large language models (LLMs) as simulators to approximate the evaluation and report various analyses to make such approximations reliable. Our method allows for scalable and consistent evaluation across various models and datasets. We report a comprehensive empirical evaluation using this framework and show that LLMs provide consistent rankings of explanation methods. Code available at https://github.com/AnonymousConSim/ConSim.

View on arXiv PDF Code

Similar