LGCVMLJul 8, 2020

A Benchmark of Medical Out of Distribution Detection

arXiv:2007.04250v264 citations
AI Analysis

This work addresses the need for reliable OoDD in medical diagnostic tools to prevent erroneous predictions, though it is incremental as it benchmarks existing methods rather than introducing new ones.

The paper tackled the problem of selecting effective Out-of-Distribution Detection (OoDD) methods for medical imaging by benchmarking popular methods across three domains (chest X-ray, fundus imaging, histology slides), finding that a simple binary classifier on feature representation achieved the best average accuracy and AUPRC, but methods often failed to detect images close to the training distribution.

Motivation: Deep learning models deployed for use on medical tasks can be equipped with Out-of-Distribution Detection (OoDD) methods in order to avoid erroneous predictions. However it is unclear which OoDD method should be used in practice. Specific Problem: Systems trained for one particular domain of images cannot be expected to perform accurately on images of a different domain. These images should be flagged by an OoDD method prior to diagnosis. Our approach: This paper defines 3 categories of OoD examples and benchmarks popular OoDD methods in three domains of medical imaging: chest X-ray, fundus imaging, and histology slides. Results: Our experiments show that despite methods yielding good results on some categories of out-of-distribution samples, they fail to recognize images close to the training distribution. Conclusion: We find a simple binary classifier on the feature representation has the best accuracy and AUPRC on average. Users of diagnostic tools which employ these OoDD methods should still remain vigilant that images very close to the training distribution yet not in it could yield unexpected results.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes