Synthetic Lung X-ray Generation through Cross-Attention and Affinity Transformation
This addresses the resource-intensive task of medical image annotation for researchers and practitioners, though it is incremental as it builds on existing diffusion models.
The paper tackles the problem of generating synthetic lung X-ray images with accurate semantic masks to reduce data collection and annotation costs, achieving segmentation models that are comparable or better than those trained on real datasets.
Collecting and annotating medical images is a time-consuming and resource-intensive task. However, generating synthetic data through models such as Diffusion offers a cost-effective alternative. This paper introduces a new method for the automatic generation of accurate semantic masks from synthetic lung X-ray images based on a stable diffusion model trained on text-image pairs. This method uses cross-attention mapping between text and image to extend text-driven image synthesis to semantic mask generation. It employs text-guided cross-attention information to identify specific areas in an image and combines this with innovative techniques to produce high-resolution, class-differentiated pixel masks. This approach significantly reduces the costs associated with data collection and annotation. The experimental results demonstrate that segmentation models trained on synthetic data generated using the method are comparable to, and in some cases even better than, models trained on real datasets. This shows the effectiveness of the method and its potential to revolutionize medical image analysis.