CVOct 23, 2025

DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion

arXiv:2510.20766v19 citationsh-index: 28
Originality Highly original
AI Analysis

This addresses the scalability problem for researchers and practitioners in high-resolution image generation, offering a significant improvement without incremental costs.

The paper tackles the high computational cost of training diffusion transformers at ultra-high resolutions by introducing DyPE, a training-free method that enables pre-trained models to generate images at resolutions far beyond their training data, achieving state-of-the-art fidelity with examples like 16 million pixels using FLUX.

Diffusion Transformer models can generate images with remarkable fidelity and detail, yet training them at ultra-high resolutions remains extremely costly due to the self-attention mechanism's quadratic scaling with the number of image tokens. In this paper, we introduce Dynamic Position Extrapolation (DyPE), a novel, training-free method that enables pre-trained diffusion transformers to synthesize images at resolutions far beyond their training data, with no additional sampling cost. DyPE takes advantage of the spectral progression inherent to the diffusion process, where low-frequency structures converge early, while high-frequencies take more steps to resolve. Specifically, DyPE dynamically adjusts the model's positional encoding at each diffusion step, matching their frequency spectrum with the current stage of the generative process. This approach allows us to generate images at resolutions that exceed the training resolution dramatically, e.g., 16 million pixels using FLUX. On multiple benchmarks, DyPE consistently improves performance and achieves state-of-the-art fidelity in ultra-high-resolution image generation, with gains becoming even more pronounced at higher resolutions. Project page is available at https://noamissachar.github.io/DyPE/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes