CVLGIVApr 20, 2021

Systematic investigation into generalization of COVID-19 CT deep learning models with Gabor ensemble for lung involvement scoring

arXiv:2105.15094v17 citations
Originality Synthesis-oriented
AI Analysis

This addresses the limited practical application of COVID-19 CT models due to poor generalization, though it is incremental as it builds on existing models with filtering techniques.

This study investigated the generalization of COVID-19 CT deep learning models across different datasets, finding high variability but showing that under certain conditions, models can generalize well with f1 scores up to 86%. Their ensemble model achieved 75% accuracy for zero lung involvement and 96% for 75-100% involvement.

The COVID-19 pandemic has inspired unprecedented data collection and computer vision modelling efforts worldwide, focusing on diagnosis and stratification of COVID-19 from medical images. Despite this large-scale research effort, these models have found limited practical application due in part to unproven generalization of these models beyond their source study. This study investigates the generalizability of key published models using the publicly available COVID-19 Computed Tomography data through cross dataset validation. We then assess the predictive ability of these models for COVID-19 severity using an independent new dataset that is stratified for COVID-19 lung involvement. Each inter-dataset study is performed using histogram equalization, and contrast limited adaptive histogram equalization with and without a learning Gabor filter. The study shows high variability in the generalization of models trained on these datasets due to varied sample image provenances and acquisition processes amongst other factors. We show that under certain conditions, an internally consistent dataset can generalize well to an external dataset despite structural differences between these datasets with f1 scores up to 86%. Our best performing model shows high predictive accuracy for lung involvement score for an independent dataset for which expertly labelled lung involvement stratification is available. Creating an ensemble of our best model for disease positive prediction with our best model for disease negative prediction using a min-max function resulted in a superior model for lung involvement prediction with average predictive accuracy of 75% for zero lung involvement and 96% for 75-100% lung involvement with almost linear relationship between these stratifications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes