Game-Theoretic Interpretability for Temporal Modeling
This addresses interpretability for temporal modeling, which is incremental as it extends interpretability methods from fixed-dimensional inputs to temporal sequences.
The paper tackles the problem of making temporal models interpretable by proposing a co-operative game between a predictor and an explainer, which locally assesses how well the predictor conforms to an interpretable family, without restricting the predictor's functional class.
Interpretability has arisen as a key desideratum of machine learning models alongside performance. Approaches so far have been primarily concerned with fixed dimensional inputs emphasizing feature relevance or selection. In contrast, we focus on temporal modeling and the problem of tailoring the predictor, functionally, towards an interpretable family. To this end, we propose a co-operative game between the predictor and an explainer without any a priori restrictions on the functional class of the predictor. The goal of the explainer is to highlight, locally, how well the predictor conforms to the chosen interpretable family of temporal models. Our co-operative game is setup asymmetrically in terms of information sets for efficiency reasons. We develop and illustrate the framework in the context of temporal sequence models with examples.