Optimal Explanations of Linear Models
This work addresses the need for systematic explanations in predictive models to enhance trust and fairness, though it is incremental as it builds on existing linear model interpretability methods.
The authors tackled the problem of interpreting linear models by proposing an optimization framework that decomposes models into sequences of increasing complexity, enabling the derivation of interpretability metrics and the study of interpretability-accuracy tradeoffs.
When predictive models are used to support complex and important decisions, the ability to explain a model's reasoning can increase trust, expose hidden biases, and reduce vulnerability to adversarial attacks. However, attempts at interpreting models are often ad hoc and application-specific, and the concept of interpretability itself is not well-defined. We propose a general optimization framework to create explanations for linear models. Our methodology decomposes a linear model into a sequence of models of increasing complexity using coordinate updates on the coefficients. Computing this decomposition optimally is a difficult optimization problem for which we propose exact algorithms and scalable heuristics. By solving this problem, we can derive a parametrized family of interpretability metrics for linear models that generalizes typical proxies, and study the tradeoff between interpretability and predictive accuracy.