MLLGMay 4, 2023

Interpretable Regional Descriptors: Hyperbox-Based Local Explanations

arXiv:2305.02780v111 citations
Originality Incremental advance
AI Analysis

This work addresses the need for interpretable AI explanations for both modelers and decision subjects, but it is incremental as it builds on existing hyperbox methods.

The paper tackles the problem of providing local, model-agnostic explanations for machine learning predictions by introducing interpretable regional descriptors (IRDs), which are hyperboxes that describe feature changes that do not affect predictions, and it benchmarks methods to improve IRDs based on quality measures.

This work introduces interpretable regional descriptors, or IRDs, for local, model-agnostic interpretations. IRDs are hyperboxes that describe how an observation's feature values can be changed without affecting its prediction. They justify a prediction by providing a set of "even if" arguments (semi-factual explanations), and they indicate which features affect a prediction and whether pointwise biases or implausibilities exist. A concrete use case shows that this is valuable for both machine learning modelers and persons subject to a decision. We formalize the search for IRDs as an optimization problem and introduce a unifying framework for computing IRDs that covers desiderata, initialization techniques, and a post-processing method. We show how existing hyperbox methods can be adapted to fit into this unified framework. A benchmark study compares the methods based on several quality measures and identifies two strategies to improve IRDs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes