RaPD: Resolution-Agnostic Pixel Diffusion via Semantics-Enriched Implicit Representations
This work addresses the limitation of discrete-grid generative models for resolution-flexible image synthesis, offering a scalable solution for high-resolution generation without increased diffusion cost.
RaPD enables resolution-agnostic image generation by performing diffusion in a continuous Neural Image Field latent space, allowing a single denoised latent to be rendered at arbitrary resolutions with fixed diffusion cost, achieving superior generation quality and scalability.
Natural images are continuous, yet most generative models synthesize them on discrete grids, limiting resolution-flexible generation. Continuous neural fields enable resolution-free rendering, but prior methods introduce continuity only at the decoding stage as an interpolation module, leaving the generative latent space discretized and reconstruction-oriented. We propose RaPD (Resolution-agnostic Pixel Diffusion), which performs diffusion in a continuous Neural Image Field (NIF) latent space. RaPD bridges this reconstruction-generation gap with Semantic Representation Guidance for generation-aware latent learning and a Coordinate-Queried Attention Renderer for coordinate-conditioned, scale-aware rendering. A single denoised latent can be rendered at arbitrary resolutions by changing only the query coordinates, keeping diffusion cost fixed. Experiments demonstrate superior generation quality and resolution scalability.