CVOct 30, 2024

FlowDCN: Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution

arXiv:2410.22655v112 citationsh-index: 9
Originality Highly original
AI Analysis

This work addresses the problem of efficient and flexible image generation for AI applications, offering a scalable solution with incremental improvements over existing methods.

The paper tackles the challenge of arbitrary-resolution image generation by proposing FlowDCN, a convolution-based model that achieves state-of-the-art performance with a 4.30 sFID on 256x256 ImageNet and reduces parameters by 8% and FLOPs by 20% compared to transformer-based methods.

Arbitrary-resolution image generation still remains a challenging task in AIGC, as it requires handling varying resolutions and aspect ratios while maintaining high visual quality. Existing transformer-based diffusion methods suffer from quadratic computation cost and limited resolution extrapolation capabilities, making them less effective for this task. In this paper, we propose FlowDCN, a purely convolution-based generative model with linear time and memory complexity, that can efficiently generate high-quality images at arbitrary resolutions. Equipped with a new design of learnable group-wise deformable convolution block, our FlowDCN yields higher flexibility and capability to handle different resolutions with a single model. FlowDCN achieves the state-of-the-art 4.30 sFID on $256\times256$ ImageNet Benchmark and comparable resolution extrapolation results, surpassing transformer-based counterparts in terms of convergence speed (only $\frac{1}{5}$ images), visual quality, parameters ($8\%$ reduction) and FLOPs ($20\%$ reduction). We believe FlowDCN offers a promising solution to scalable and flexible image synthesis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes