CVFeb 23

HD-TTA: Hypothesis-Driven Test-Time Adaptation for Safer Brain Tumor Segmentation

arXiv:2602.19454v1h-index: 5
Originality Highly original
AI Analysis

This work addresses safety-critical issues in medical segmentation for clinical deployment, offering a novel approach to mitigate risks like tumor spillage into healthy tissue.

The paper tackles the problem of unsafe tumor mask predictions in brain tumor segmentation during test-time adaptation by proposing a hypothesis-driven framework that generates competing geometric hypotheses and selects the safest outcome, resulting in a reduction of Hausdorff Distance by approximately 6.4 mm and an improvement in Precision by over 4%.

Standard Test-Time Adaptation (TTA) methods typically treat inference as a blind optimization task, applying generic objectives to all or filtered test samples. In safety-critical medical segmentation, this lack of selectivity often causes the tumor mask to spill into healthy brain tissue or degrades predictions that were already correct. We propose Hypothesis-Driven TTA, a novel framework that reformulates adaptation as a dynamic decision process. Rather than forcing a single optimization trajectory, our method generates intuitive competing geometric hypotheses: compaction (is the prediction noisy? trim artifacts) versus inflation (is the valid tumor under-segmented? safely inflate to recover). It then employs a representation-guided selector to autonomously identify the safest outcome based on intrinsic texture consistency. Additionally, a pre-screening Gatekeeper prevents negative transfer by skipping adaptation on confident cases. We validate this proof-of-concept on a cross-domain binary brain tumor segmentation task, applying a source model trained on adult BraTS gliomas to unseen pediatric and more challenging meningioma target domains. HD-TTA improves safety-oriented outcomes (Hausdorff Distance (HD95) and Precision) over several state-of-the-art representative baselines in the challenging safety regime, reducing the HD95 by approximately 6.4 mm and improving Precision by over 4%, while maintaining comparable Dice scores. These results demonstrate that resolving the safety-adaptation trade-off via explicit hypothesis selection is a viable, robust path for safe clinical model deployment. Code will be made publicly available upon acceptance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes