Text2CT: Towards 3D CT Volume Generation from Free-text Descriptions Using Diffusion Model
This work addresses a transformative opportunity in diagnostics and research by enabling 3D CT generation from free-text, though it appears incremental as it builds on existing diffusion models with a novel prompt formulation.
The paper tackles the problem of generating 3D CT volumes from free-text descriptions, introducing Text2CT, a diffusion model-based approach that achieves state-of-the-art results in preserving anatomical fidelity and capturing intricate structures from diverse textual inputs.
Generating 3D CT volumes from descriptive free-text inputs presents a transformative opportunity in diagnostics and research. In this paper, we introduce Text2CT, a novel approach for synthesizing 3D CT volumes from textual descriptions using the diffusion model. Unlike previous methods that rely on fixed-format text input, Text2CT employs a novel prompt formulation that enables generation from diverse, free-text descriptions. The proposed framework encodes medical text into latent representations and decodes them into high-resolution 3D CT scans, effectively bridging the gap between semantic text inputs and detailed volumetric representations in a unified 3D framework. Our method demonstrates superior performance in preserving anatomical fidelity and capturing intricate structures as described in the input text. Extensive evaluations show that our approach achieves state-of-the-art results, offering promising potential applications in diagnostics, and data augmentation.