CVAIFeb 24

Scaling View Synthesis Transformers

arXiv:2602.21341v13 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses the problem of inefficient scaling in geometry-free view synthesis for researchers and practitioners, offering incremental improvements in design principles.

The paper tackles the unclear scaling laws for view synthesis transformers in novel view synthesis, showing that encoder-decoder architectures can be compute-optimal and achieve state-of-the-art performance with reduced training compute, surpassing previous benchmarks.

Geometry-free view synthesis transformers have recently achieved state-of-the-art performance in Novel View Synthesis (NVS), outperforming traditional approaches that rely on explicit geometry modeling. Yet the factors governing their scaling with compute remain unclear. We present a systematic study of scaling laws for view synthesis transformers and derive design principles for training compute-optimal NVS models. Contrary to prior findings, we show that encoder-decoder architectures can be compute-optimal; we trace earlier negative results to suboptimal architectural choices and comparisons across unequal training compute budgets. Across several compute levels, we demonstrate that our encoder-decoder architecture, which we call the Scalable View Synthesis Model (SVSM), scales as effectively as decoder-only models, achieves a superior performance-compute Pareto frontier, and surpasses the previous state-of-the-art on real-world NVS benchmarks with substantially reduced training compute.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes