On the Pitfalls of Using the Residual Error as Anomaly Score
This work addresses a critical flaw in anomaly localization methods for medical imaging, potentially improving diagnostic reliability, but it is incremental as it critiques existing approaches without proposing a new solution.
The paper tackles the problem that residual images, used as anomaly scores in medical imaging, are unreliable because model reconstruction errors can overshadow true anomalies. It demonstrates this issue through theoretical analysis and experiments, showing that imperfect reconstructions significantly impact detection accuracy.
Many current state-of-the-art methods for anomaly localization in medical images rely on calculating a residual image between a potentially anomalous input image and its "healthy" reconstruction. As the reconstruction of the unseen anomalous region should be erroneous, this yields large residuals as a score to detect anomalies in medical images. However, this assumption does not take into account residuals resulting from imperfect reconstructions of the machine learning models used. Such errors can easily overshadow residuals of interest and therefore strongly question the use of residual images as scoring function. Our work explores this fundamental problem of residual images in detail. We theoretically define the problem and thoroughly evaluate the influence of intensity and texture of anomalies against the effect of imperfect reconstructions in a series of experiments. Code and experiments are available under https://github.com/FeliMe/residual-score-pitfalls