AICLOct 28, 2024

Towards Unifying Evaluation of Counterfactual Explanations: Leveraging Large Language Models for Human-Centric Assessments

arXiv:2410.21131v311 citationsh-index: 7AAAI
Originality Incremental advance
AI Analysis

This work addresses the need for more human-centric and scalable evaluation techniques for counterfactual explanations in explainable AI, though it is incremental in improving existing evaluation frameworks.

The paper tackled the problem of evaluating counterfactual explanations in AI by developing a method using Large Language Models (LLMs) to predict human judgments, achieving up to 63% accuracy in zero-shot evaluations and 85% with fine-tuning across 8 metrics.

As machine learning models evolve, maintaining transparency demands more human-centric explainable AI techniques. Counterfactual explanations, with roots in human reasoning, identify the minimal input changes needed to obtain a given output and, hence, are crucial for supporting decision-making. Despite their importance, the evaluation of these explanations often lacks grounding in user studies and remains fragmented, with existing metrics not fully capturing human perspectives. To address this challenge, we developed a diverse set of 30 counterfactual scenarios and collected ratings across 8 evaluation metrics from 206 respondents. Subsequently, we fine-tuned different Large Language Models (LLMs) to predict average or individual human judgment across these metrics. Our methodology allowed LLMs to achieve an accuracy of up to 63% in zero-shot evaluations and 85% (over a 3-classes prediction) with fine-tuning across all metrics. The fine-tuned models predicting human ratings offer better comparability and scalability in evaluating different counterfactual explanation frameworks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes