IVCVAug 8, 2025

Clinically-guided Data Synthesis for Laryngeal Lesion Detection

arXiv:2508.06182v12 citationsh-index: 18MICCAI
Originality Incremental advance
AI Analysis

This work addresses data scarcity for specialized endoscopic CADx/e systems in otorhinolaryngology, offering a solution to accelerate automated laryngeal disease diagnosis tools.

The study tackled the problem of data scarcity for computer-aided diagnosis and detection systems in laryngology by generating synthetic laryngeal endoscopic images using a Latent Diffusion Model with a ControlNet adapter, resulting in a 9% improvement in lesion detection rate with 10% synthetic data added internally and 22.1% on out-of-domain data.

Although computer-aided diagnosis (CADx) and detection (CADe) systems have made significant progress in various medical domains, their application is still limited in specialized fields such as otorhinolaryngology. In the latter, current assessment methods heavily depend on operator expertise, and the high heterogeneity of lesions complicates diagnosis, with biopsy persisting as the gold standard despite its substantial costs and risks. A critical bottleneck for specialized endoscopic CADx/e systems is the lack of well-annotated datasets with sufficient variability for real-world generalization. This study introduces a novel approach that exploits a Latent Diffusion Model (LDM) coupled with a ControlNet adapter to generate laryngeal endoscopic image-annotation pairs, guided by clinical observations. The method addresses data scarcity by conditioning the diffusion process to produce realistic, high-quality, and clinically relevant image features that capture diverse anatomical conditions. The proposed approach can be leveraged to expand training datasets for CADx/e models, empowering the assessment process in laryngology. Indeed, during a downstream task of detection, the addition of only 10% synthetic data improved the detection rate of laryngeal lesions by 9% when the model was internally tested and 22.1% on out-of-domain external data. Additionally, the realism of the generated images was evaluated by asking 5 expert otorhinolaryngologists with varying expertise to rate their confidence in distinguishing synthetic from real images. This work has the potential to accelerate the development of automated tools for laryngeal disease diagnosis, offering a solution to data scarcity and demonstrating the applicability of synthetic data in real-world scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes