LGCVFeb 10

Learning on the Manifold: Unlocking Standard Diffusion Transformers with Representation Encoders

arXiv:2602.10099v17 citationsh-index: 1Has Code
Originality Highly original
AI Analysis

This addresses a fundamental geometric issue in generative modeling for researchers and practitioners, enabling efficient synthesis without expensive scaling, though it is incremental in improving existing methods.

The paper tackled the problem of standard diffusion transformers failing to converge on representation encoders for generative modeling, identifying geometric interference as the root cause, and proposed Riemannian Flow Matching with Jacobi Regularization (RJF) to enable convergence without width scaling, achieving an FID of 3.37 with the DiT-B architecture.

Leveraging representation encoders for generative modeling offers a path for efficient, high-fidelity synthesis. However, standard diffusion transformers fail to converge on these representations directly. While recent work attributes this to a capacity bottleneck proposing computationally expensive width scaling of diffusion transformers we demonstrate that the failure is fundamentally geometric. We identify Geometric Interference as the root cause: standard Euclidean flow matching forces probability paths through the low-density interior of the hyperspherical feature space of representation encoders, rather than following the manifold surface. To resolve this, we propose Riemannian Flow Matching with Jacobi Regularization (RJF). By constraining the generative process to the manifold geodesics and correcting for curvature-induced error propagation, RJF enables standard Diffusion Transformer architectures to converge without width scaling. Our method RJF enables the standard DiT-B architecture (131M parameters) to converge effectively, achieving an FID of 3.37 where prior methods fail to converge. Code: https://github.com/amandpkr/RJF

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes