SALIENT: Frequency-Aware Paired Diffusion for Controllable Long-Tail CT Detection

Yifan Li, Mehrdad Salimitari, Taiyu Zhang, Guang Li, David Dreizin

arXiv:2602.23447v12.6h-index: 26

Originality Highly original

AI Analysis

This paper tackles the problem of improving rare lesion detection in medical imaging for radiologists, which is an incremental but important step in medical AI.

The paper addresses the challenge of detecting rare lesions in whole-body CT scans, which suffer from extreme class imbalance and low target-to-volume ratios, leading to precision collapse. The authors introduce SALIENT, a mask-conditioned wavelet-domain diffusion framework that synthesizes paired lesion-masking volumes for controllable CT augmentation. This method improves generative realism, with MS-SSIM increasing from 0.63 to 0.83 and FID decreasing from 118.4 to 46.5, and enhances long-tail detection performance, particularly in low prevalence and target-to-volume ratio scenarios.

Detection of rare lesions in whole-body CT is fundamentally limited by extreme class imbalance and low target-to-volume ratios, producing precision collapse despite high AUROC. Synthetic augmentation with diffusion models offers promise, yet pixel-space diffusion is computationally expensive, and existing mask-conditioned approaches lack controllable attribute-level regulation and paired supervision for accountable training. We introduce SALIENT, a mask-conditioned wavelet-domain diffusion framework that synthesizes paired lesion-masking volumes for controllable CT augmentation under long-tail regimes. Instead of denoising in pixel space, SALIENT performs structured diffusion over discrete wavelet coefficients, explicitly separating low-frequency brightness from high-frequency structural detail. Learnable frequency-aware objectives disentangle target and background attributes (structure, contrast, edge fidelity), enabling interpretable and stable optimization. A 3D VAE generates diverse volumetric lesion masks, and a semi-supervised teacher produces paired slice-level pseudo-labels for downstream mask-guided detection. SALIENT improves generative realism, as reflected by higher MS-SSIM (0.63 to 0.83) and lower FID (118.4 to 46.5). In a separate downstream evaluation, SALIENT-augmented training improves long-tail detection performance, yielding disproportionate AUPRC gains across low prevalences and target-to-volume ratios. Optimal synthetic ratios shift from 2x to 4x as labeled seed size decreases, indicating a seed-dependent augmentation regime under low-label conditions. SALIENT demonstrates that frequency-aware diffusion enables controllable, computationally efficient precision rescue in long-tail CT detection.

View on arXiv PDF

Similar