CV LG IV MLDec 8, 2019

Individual predictions matter: Assessing the effect of data ordering in training fine-tuned CNNs for medical imaging

John R. Zech, Jessica Zosa Forde, Michael L. Littman

arXiv:1912.03606v14.16 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This highlights a critical variability issue in medical imaging CNNs that could impact individual patient care, though it is incremental as it builds on existing methods.

The study reproduced CheXNet with 50 random seeds to assess data ordering effects in fine-tuned CNNs for chest radiograph analysis, finding substantial variability in individual predictions (mean coefficient of variation 0.543) that was reduced by nearly 70% through ensembling 10 models.

We reproduced the results of CheXNet with fixed hyperparameters and 50 different random seeds to identify 14 finding in chest radiographs (x-rays). Because CheXNet fine-tunes a pre-trained DenseNet, the random seed affects the ordering of the batches of training data but not the initialized model weights. We found substantial variability in predictions for the same radiograph across model runs (mean ln[(maximum probability)/(minimum probability)] 2.45, coefficient of variation 0.543). This individual radiograph-level variability was not fully reflected in the variability of AUC on a large test set. Averaging predictions from 10 models reduced variability by nearly 70% (mean coefficient of variation from 0.543 to 0.169, t-test 15.96, p-value < 0.0001). We encourage researchers to be aware of the potential variability of CNNs and ensemble predictions from multiple models to minimize the effect this variability may have on the care of individual patients when these models are deployed clinically.

View on arXiv PDF Code

Similar