LGAug 5, 2024

Backward Compatibility in Attributive Explanation and Enhanced Model Training Method

arXiv:2408.02298v1h-index: 4
Originality Incremental advance
AI Analysis

This addresses the issue of explanation instability during model updates for real-world ML/AI applications, though it is incremental as it builds on existing explanation methods.

The paper tackles the problem of model updates causing detrimental changes in feature attribution explanations by introducing BCX, a metric for evaluating backward compatibility, and BCXR, a training method that improves agreement between pre- and post-update models, achieving superior trade-offs on eight real-world datasets.

Model update is a crucial process in the operation of ML/AI systems. While updating a model generally enhances the average prediction performance, it also significantly impacts the explanations of predictions. In real-world applications, even minor changes in explanations can have detrimental consequences. To tackle this issue, this paper introduces BCX, a quantitative metric that evaluates the backward compatibility of feature attribution explanations between pre- and post-update models. BCX utilizes practical agreement metrics to calculate the average agreement between the explanations of pre- and post-update models, specifically among samples on which both models accurately predict. In addition, we propose BCXR, a BCX-aware model training method by designing surrogate losses which theoretically lower bounds agreement scores. Furthermore, we present a universal variant of BCXR that improves all agreement metrics, utilizing L2 distance among the explanations of the models. To validate our approach, we conducted experiments on eight real-world datasets, demonstrating that BCXR achieves superior trade-offs between predictive performances and BCX scores, showcasing the effectiveness of our BCXR methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes