CVLGMar 4, 2021

Evaluation of Complexity Measures for Deep Learning Generalization in Medical Image Analysis

arXiv:2103.03328v38 citations
AI Analysis

This work addresses the challenge of ensuring trustworthy deep learning for clinicians by providing empirical insights into generalization in medical imaging, though it is incremental as it applies existing measures to a new domain.

The paper tackled the problem of predicting generalization performance for deep learning models in medical image analysis by evaluating 25 complexity measures on breast ultrasound images, finding that PAC-Bayes flatness-based and path norm-based measures were most consistent, and that multi-task learning improved generalization.

The generalization performance of deep learning models for medical image analysis often decreases on images collected with different devices for data acquisition, device settings, or patient population. A better understanding of the generalization capacity on new images is crucial for clinicians' trustworthiness in deep learning. Although significant research efforts have been recently directed toward establishing generalization bounds and complexity measures, still, there is often a significant discrepancy between the predicted and actual generalization performance. As well, related large empirical studies have been primarily based on validation with general-purpose image datasets. This paper presents an empirical study that investigates the correlation between 25 complexity measures and the generalization abilities of supervised deep learning classifiers for breast ultrasound images. The results indicate that PAC-Bayes flatness-based and path norm-based measures produce the most consistent explanation for the combination of models and data. We also investigate the use of multi-task classification and segmentation approach for breast images, and report that such learning approach acts as an implicit regularizer and is conducive toward improved generalization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes