Human-in-the-Loop Interpretability Prior
This work addresses the challenge of making AI models interpretable for users by moving beyond easy-to-quantify proxies, though it appears incremental as it builds on prior work by incorporating human feedback.
The paper tackles the problem of optimizing models for interpretability by directly including humans in the optimization loop, resulting in an algorithm that minimizes user studies to find predictive and interpretable models, with human subjects results showing trends towards different proxy notions of interpretability across datasets.
We often desire our models to be interpretable as well as accurate. Prior work on optimizing models for interpretability has relied on easy-to-quantify proxies for interpretability, such as sparsity or the number of operations required. In this work, we optimize for interpretability by directly including humans in the optimization loop. We develop an algorithm that minimizes the number of user studies to find models that are both predictive and interpretable and demonstrate our approach on several data sets. Our human subjects results show trends towards different proxy notions of interpretability on different datasets, which suggests that different proxies are preferred on different tasks.