HCAICVJul 1, 2021

Quality Metrics for Transparent Machine Learning With and Without Humans In the Loop Are Not Correlated

arXiv:2107.02033v110 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of evaluating interpretability in machine learning for researchers and practitioners, highlighting the need for human-in-the-loop assessments, though it is incremental in applying existing psychophysical techniques to XAI.

The study tackled the problem of assessing the usefulness of explainable AI (XAI) methods for humans by using psychophysical experiments to measure annotation accuracy and task time, finding that automated quality metrics did not correlate with human-centric performance.

The field explainable artificial intelligence (XAI) has brought about an arsenal of methods to render Machine Learning (ML) predictions more interpretable. But how useful explanations provided by transparent ML methods are for humans remains difficult to assess. Here we investigate the quality of interpretable computer vision algorithms using techniques from psychophysics. In crowdsourced annotation tasks we study the impact of different interpretability approaches on annotation accuracy and task time. We compare these quality metrics with classical XAI, automated quality metrics. Our results demonstrate that psychophysical experiments allow for robust quality assessment of transparency in machine learning. Interestingly the quality metrics computed without humans in the loop did not provide a consistent ranking of interpretability methods nor were they representative for how useful an explanation was for humans. These findings highlight the potential of methods from classical psychophysics for modern machine learning applications. We hope that our results provide convincing arguments for evaluating interpretability in its natural habitat, human-ML interaction, if the goal is to obtain an authentic assessment of interpretability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes