LesionDiffusion: Towards Text-controlled General Lesion Synthesis
This addresses the problem of expensive annotated data for lesion recognition in medical imaging, offering a scalable and controllable synthesis method, though it is incremental as it builds on existing diffusion and inpainting techniques.
The paper tackles the challenge of generating synthetic lesions in 3D CT imaging for medical recognition tasks by proposing LesionDiffusion, a text-controllable framework that improves segmentation performance and generalizes to unseen lesion types and organs, outperforming state-of-the-art models.
Fully-supervised lesion recognition methods in medical imaging face challenges due to the reliance on large annotated datasets, which are expensive and difficult to collect. To address this, synthetic lesion generation has become a promising approach. However, existing models struggle with scalability, fine-grained control over lesion attributes, and the generation of complex structures. We propose LesionDiffusion, a text-controllable lesion synthesis framework for 3D CT imaging that generates both lesions and corresponding masks. By utilizing a structured lesion report template, our model provides greater control over lesion attributes and supports a wider variety of lesion types. We introduce a dataset of 1,505 annotated CT scans with paired lesion masks and structured reports, covering 14 lesion types across 8 organs. LesionDiffusion consists of two components: a lesion mask synthesis network (LMNet) and a lesion inpainting network (LINet), both guided by lesion attributes and image features. Extensive experiments demonstrate that LesionDiffusion significantly improves segmentation performance, with strong generalization to unseen lesion types and organs, outperforming current state-of-the-art models. Code is available at https://github.com/HengruiTianSJTU/LesionDiffusion.