Contrastive Explanations for Explaining Model Adaptations
This addresses the need for transparency in adaptive AI systems, which is an incremental step in interpretability research.
The paper tackles the problem of explaining model adaptations in non-static AI systems by proposing a framework for contrastive explanations and a method to identify affected data regions, with empirical evaluation.
Many decision making systems deployed in the real world are not static - a phenomenon known as model adaptation takes place over time. The need for transparency and interpretability of AI-based decision models is widely accepted and thus have been worked on extensively. Usually, explanation methods assume a static system that has to be explained. Explaining non-static systems is still an open research question, which poses the challenge how to explain model adaptations. In this contribution, we propose and (empirically) evaluate a framework for explaining model adaptations by contrastive explanations. We also propose a method for automatically finding regions in data space that are affected by a given model adaptation and thus should be explained.