CVDec 13, 2023

NViST: In the Wild New View Synthesis from a Single Image with Transformers

arXiv:2312.08568v222 citationsh-index: 7CVPR
Originality Incremental advance
AI Analysis

This work addresses the challenge of synthesizing new views from single images in uncontrolled, real-world environments, representing an incremental step towards more practical applications.

The paper tackles the problem of novel-view synthesis from a single image for real-world scenes, proposing NViST, a transformer-based model that achieves efficient and generalizable results by training on a large-scale dataset of casually-captured videos, showing generalization to unseen objects and categories.

We propose NViST, a transformer-based model for efficient and generalizable novel-view synthesis from a single image for real-world scenes. In contrast to many methods that are trained on synthetic data, object-centred scenarios, or in a category-specific manner, NViST is trained on MVImgNet, a large-scale dataset of casually-captured real-world videos of hundreds of object categories with diverse backgrounds. NViST transforms image inputs directly into a radiance field, conditioned on camera parameters via adaptive layer normalisation. In practice, NViST exploits fine-tuned masked autoencoder (MAE) features and translates them to 3D output tokens via cross-attention, while addressing occlusions with self-attention. To move away from object-centred datasets and enable full scene synthesis, NViST adopts a 6-DOF camera pose model and only requires relative pose, dropping the need for canonicalization of the training data, which removes a substantial barrier to it being used on casually captured datasets. We show results on unseen objects and categories from MVImgNet and even generalization to casual phone captures. We conduct qualitative and quantitative evaluations on MVImgNet and ShapeNet to show that our model represents a step forward towards enabling true in-the-wild generalizable novel-view synthesis from a single image. Project webpage: https://wbjang.github.io/nvist_webpage.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes