Counterfactual Explanations of Concept Drift
This work addresses the need for human-understandable explanations of concept drift to improve acceptance of life-long learning models, though it is incremental as it builds on existing drift detection and counterfactual explanation methods.
The paper tackles the problem of explaining concept drift, which occurs when data distributions change over time and reduces model accuracy, by introducing a method that characterizes drift using counterfactual explanations to highlight key feature changes. It provides a formal definition, an efficient algorithm, and demonstrates utility through examples.
The notion of concept drift refers to the phenomenon that the distribution, which is underlying the observed data, changes over time; as a consequence machine learning models may become inaccurate and need adjustment. While there do exist methods to detect concept drift or to adjust models in the presence of observed drift, the question of explaining drift has hardly been considered so far. This problem is of importance, since it enables an inspection of the most prominent features where drift manifests itself; hence it enables human understanding of the necessity of change and it increases acceptance of life-long learning models. In this paper we present a novel technology, which characterizes concept drift in terms of the characteristic change of spatial features represented by typical examples based on counterfactual explanations. We establish a formal definition of this problem, derive an efficient algorithmic solution based on counterfactual explanations, and demonstrate its usefulness in several examples.