Timepoint-Specific Benchmarking of Deep Learning Models for Glioblastoma Follow-Up MRI
This addresses a critical diagnostic problem for glioblastoma patients and clinicians, though it is incremental as it benchmarks existing methods on new time-specific data.
This study tackled the challenge of distinguishing true tumor progression from pseudoprogression in glioblastoma using follow-up MRI scans by benchmarking 11 deep learning models across different post-treatment time points, finding that discrimination improved at later follow-ups with accuracies around 0.70-0.74 and a Mamba+CNN hybrid offering the best accuracy-efficiency trade-off.
Differentiating true tumor progression (TP) from treatment-related pseudoprogression (PsP) in glioblastoma remains challenging, especially at early follow-up. We present the first stage-specific, cross-sectional benchmarking of deep learning models for follow-up MRI using the Burdenko GBM Progression cohort (n = 180). We analyze different post-RT scans independently to test whether architecture performance depends on time-point. Eleven representative DL families (CNNs, LSTMs, hybrids, transformers, and selective state-space models) were trained under a unified, QC-driven pipeline with patient-level cross-validation. Across both stages, accuracies were comparable (~0.70-0.74), but discrimination improved at the second follow-up, with F1 and AUC increasing for several models, indicating richer separability later in the care pathway. A Mamba+CNN hybrid consistently offered the best accuracy-efficiency trade-off, while transformer variants delivered competitive AUCs at substantially higher computational cost and lightweight CNNs were efficient but less reliable. Performance also showed sensitivity to batch size, underscoring the need for standardized training protocols. Notably, absolute discrimination remained modest overall, reflecting the intrinsic difficulty of TP vs. PsP and the dataset's size imbalance. These results establish a stage-aware benchmark and motivate future work incorporating longitudinal modeling, multi-sequence MRI, and larger multi-center cohorts.