CVCLLGJul 7, 2025

CytoDiff: AI-Driven Cytomorphology Image Synthesis for Medical Diagnostics

arXiv:2507.05063v2h-index: 9Has Code2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
Originality Incremental advance
AI Analysis

This work addresses data scarcity and privacy issues in medical diagnostics, offering a tool to enhance classifier performance in hematology, though it is incremental as it applies existing generative methods to a specific domain.

The paper tackled the problem of limited and imbalanced biomedical datasets for white blood cell classification, a critical task in diagnosing hematological malignancies like AML, by introducing CytoDiff, a stable diffusion model that generates synthetic images, resulting in accuracy improvements from 27% to 78% for ResNet and from 62% to 77% for CLIP-based classifiers.

Biomedical datasets are often constrained by stringent privacy requirements and frequently suffer from severe class imbalance. These two aspects hinder the development of accurate machine learning models. While generative AI offers a promising solution, producing synthetic images of sufficient quality for training robust classifiers remains challenging. This work addresses the classification of individual white blood cells, a critical task in diagnosing hematological malignancies such as acute myeloid leukemia (AML). We introduce CytoDiff, a stable diffusion model fine-tuned with LoRA weights and guided by few-shot samples that generates high-fidelity synthetic white blood cell images. Our approach demonstrates substantial improvements in classifier performance when training data is limited. Using a small, highly imbalanced real dataset, the addition of 5,000 synthetic images per class improved ResNet classifier accuracy from 27\% to 78\% (+51\%). Similarly, CLIP-based classification accuracy increased from 62\% to 77\% (+15\%). These results establish synthetic image generation as a valuable tool for biomedical machine learning, enhancing data coverage and facilitating secure data sharing while preserving patient privacy. Paper code is publicly available at https://github.com/JanCarreras24/CytoDiff.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes