CVMay 16

The Learnability Gap in Medical Latent Diffusion

arXiv:2605.1708768.3
Predicted impact top 46% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For medical imaging practitioners using generative data augmentation, the paper reveals that latent space structure, not fidelity, is the primary bottleneck, shifting focus from autoencoder fine-tuning to latent space design.

The paper identifies and formalizes the 'learnability gap' in latent diffusion models for medical imaging: pretrained autoencoders preserve discriminative features but structure latents in ways that hinder classifier learning. Across five autoencoder families and four medical benchmarks, the gap persists despite fine-tuning, and noise-conditioned latent classifiers with FiLM layers achieve 64x throughput and 120x memory gains while serving as diagnostic tools.

Generative data augmentation with latent diffusion models is a promising strategy for addressing class imbalance in medical imaging, yet current approaches focus on perceptual fidelity and domain-specific autoencoder fine-tuning while neglecting a more fundamental bottleneck. We identify and formalize the learnability gap: large-scale pretrained autoencoders faithfully encode discriminative features for medical classification, as evidenced by near-lossless performance in reconstruction space, yet their latent representations are structured in ways that are difficult for classifiers to learn from. Across five autoencoder families and four medical benchmarks spanning chest radiography, dermatoscopy, computed tomography, and echocardiography, we show that this gap persists regardless of architecture, initialization strategy, or hyperparameter tuning, and that medical-domain fine-tuning of the autoencoder does not close it. To probe and partially narrow the gap, we develop noise-conditioned latent classifiers with FiLM layers and image-space distillation that offer 64x throughput and 120x memory gains over image-space models while serving as diagnostic tools for latent space quality. Our analysis provides a new framework for evaluating autoencoder latent spaces and identifies their structure, rather than their fidelity or domain specificity, as the primary obstacle to closing the performance gap between real and synthetic medical training data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes