Multi-Granularity 3D Kidney Lesion Characterization from CT Volumes
For radiologists and clinical NLP, this provides a framework for structured report generation from 3D CT, but performance on rare lesions is poor, indicating data scarcity as the main bottleneck.
The authors reformulate kidney CT characterization as a per-lesion set-prediction task, proposing LesionDETR to predict lesion type, size, enhancement, and attenuation. On 2,619 CT volumes, they achieve bilateral side-level abnormality AUC of 0.799 on UF-Health and 0.817 on KiTS23, but per-lesion mAP for cystic lesions is only 0.190, with rare solid-lesion AP at noise floor.
Radiology reports describe kidney lesions by type, size, enhancement, and attenuation, yet existing 3D methods predict only at the patient or organ level. We reformulate kidney CT characterization as a per-lesion set-prediction task: one model emits a variable number of lesions per kidney, each with four clinical attributes. We curated 2,619 CT volumes from 788 patients at one academic medical center, with multi-granularity side- and per-lesion labels, and used KiTS23 (489 cases) for zero-shot external validation. We propose \textbf{LesionDETR}, a DETR-style architecture with size-distance Hungarian matching and a hierarchical loss that aggregates per-slot outputs to side-level objectives. Across four input representations and six encoder initializations, two design choices dominate: a segmentation mask as an input channel, and same-domain abdominal pretraining (SuPreM); generic large-corpus pretraining is no better than random initialization. LesionDETR reaches bilateral side-level abnormality AUC $0.799 \pm 0.009$ on UF-Health and $0.817 \pm 0.072$ on KiTS23. A count-conditioned variant reaches per-lesion mAP $0.190 \pm 0.083$ on cystic lesions; rare solid-lesion AP stays at the noise floor, pointing to targeted data collection, not architecture, as the next bottleneck. The framework yields verified per-lesion predictions for downstream structured report generation.