CVOct 5, 2025

Scaling Sequence-to-Sequence Generative Neural Rendering

arXiv:2510.04236v17 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses the challenge of generating consistent 3D views without explicit 3D representations, benefiting fields like computer graphics and VR, but it is incremental as it builds on existing sequence-to-sequence and generative modeling approaches.

The paper tackles the problem of photorealistic neural rendering for objects and scenes by treating 3D as a sequence-to-sequence image synthesis task, resulting in a model that sets a new state-of-the-art on view synthesis benchmarks and matches per-scene optimisation methods in many-view settings.

We present Kaleido, a family of generative models designed for photorealistic, unified object- and scene-level neural rendering. Kaleido operates on the principle that 3D can be regarded as a specialised sub-domain of video, expressed purely as a sequence-to-sequence image synthesis task. Through a systemic study of scaling sequence-to-sequence generative neural rendering, we introduce key architectural innovations that enable our model to: i) perform generative view synthesis without explicit 3D representations; ii) generate any number of 6-DoF target views conditioned on any number of reference views via a masked autoregressive framework; and iii) seamlessly unify 3D and video modelling within a single decoder-only rectified flow transformer. Within this unified framework, Kaleido leverages large-scale video data for pre-training, which significantly improves spatial consistency and reduces reliance on scarce, camera-labelled 3D datasets -- all without any architectural modifications. Kaleido sets a new state-of-the-art on a range of view synthesis benchmarks. Its zero-shot performance substantially outperforms other generative methods in few-view settings, and, for the first time, matches the quality of per-scene optimisation methods in many-view settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes