LGOct 16, 2025

Contrastive Diffusion Alignment: Learning Structured Latents for Controllable Generation

arXiv:2510.14190v1h-index: 34
Originality Incremental advance
AI Analysis

This work addresses the need for more controllable and interpretable generation in diffusion models, offering a domain-specific improvement for applications requiring precise manipulation.

The paper tackled the problem of diffusion models lacking interpretable latent spaces for control by introducing ConDA, a framework that uses contrastive learning to align latent geometry with system dynamics, resulting in improved controllability across benchmarks like fluid dynamics and facial expression generation.

Diffusion models excel at generation, but their latent spaces are not explicitly organized for interpretable control. We introduce ConDA (Contrastive Diffusion Alignment), a framework that applies contrastive learning within diffusion embeddings to align latent geometry with system dynamics. Motivated by recent advances showing that contrastive objectives can recover more disentangled and structured representations, ConDA organizes diffusion latents such that traversal directions reflect underlying dynamical factors. Within this contrastively structured space, ConDA enables nonlinear trajectory traversal that supports faithful interpolation, extrapolation, and controllable generation. Across benchmarks in fluid dynamics, neural calcium imaging, therapeutic neurostimulation, and facial expression, ConDA produces interpretable latent representations with improved controllability compared to linear traversals and conditioning-based baselines. These results suggest that diffusion latents encode dynamics-relevant structure, but exploiting this structure requires latent organization and traversal along the latent manifold.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes