Text Embedded Swin-UMamba for DeepLesion Segmentation
This work addresses automatic lesion measurement for clinical assessment in diseases like lymphoma, representing an incremental advance by combining imaging and text data.
The study tackled lesion segmentation on CT scans by integrating text descriptions into the Swin-UMamba architecture, achieving a Dice Score of 82% and a Hausdorff distance of 6.58 pixels, with a 37% improvement over prior LLM-driven methods.
Segmentation of lesions on CT enables automatic measurement for clinical assessment of chronic diseases (e.g., lymphoma). Integrating large language models (LLMs) into the lesion segmentation workflow offers the potential to combine imaging features with descriptions of lesion characteristics from the radiology reports. In this study, we investigate the feasibility of integrating text into the Swin-UMamba architecture for the task of lesion segmentation. The publicly available ULS23 DeepLesion dataset was used along with short-form descriptions of the findings from the reports. On the test dataset, a high Dice Score of 82% and low Hausdorff distance of 6.58 (pixels) was obtained for lesion segmentation. The proposed Text-Swin-UMamba model outperformed prior approaches: 37% improvement over the LLM-driven LanGuideMedSeg model (p < 0.001),and surpassed the purely image-based xLSTM-UNet and nnUNet models by 1.74% and 0.22%, respectively. The dataset and code can be accessed at https://github.com/ruida/LLM-Swin-UMamba