MLAICVLGMar 2, 2021

Have We Learned to Explain?: How Interpretability Methods Can Learn to Encode Predictions in their Interpretations

arXiv:2103.01890v181 citations
Originality Incremental advance
AI Analysis

This addresses the issue of unreliable interpretability in machine learning, particularly for domains like healthcare, though it is incremental as it builds on existing amortized explanation methods.

The paper tackles the problem of interpretability methods encoding predictions in their interpretations, which undermines their fidelity, by introducing EVAL-X for quantitative evaluation and REAL-X as an amortized explanation method that learns a predictor model approximating the true data distribution; results show EVAL-X can detect encoded predictions and REAL-X offers advantages in quantitative and radiologist evaluations.

While the need for interpretable machine learning has been established, many common approaches are slow, lack fidelity, or hard to evaluate. Amortized explanation methods reduce the cost of providing interpretations by learning a global selector model that returns feature importances for a single instance of data. The selector model is trained to optimize the fidelity of the interpretations, as evaluated by a predictor model for the target. Popular methods learn the selector and predictor model in concert, which we show allows predictions to be encoded within interpretations. We introduce EVAL-X as a method to quantitatively evaluate interpretations and REAL-X as an amortized explanation method, which learn a predictor model that approximates the true data generating distribution given any subset of the input. We show EVAL-X can detect when predictions are encoded in interpretations and show the advantages of REAL-X through quantitative and radiologist evaluation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes