CVAug 28, 2024

3D Reconstruction with Spatial Memory

arXiv:2408.16061v142.6256 citationsh-index: 7

Originality Incremental advance

AI Analysis

This addresses the problem of efficient and accurate 3D reconstruction for computer vision applications, representing an incremental improvement over existing methods like DUSt3R.

The paper tackles dense 3D reconstruction from image collections by introducing Spann3R, which uses a transformer-based architecture with spatial memory to predict per-image pointmaps in a global coordinate system, eliminating the need for optimization-based alignment and achieving competitive performance and real-time processing on unseen datasets.

We present Spann3R, a novel approach for dense 3D reconstruction from ordered or unordered image collections. Built on the DUSt3R paradigm, Spann3R uses a transformer-based architecture to directly regress pointmaps from images without any prior knowledge of the scene or camera parameters. Unlike DUSt3R, which predicts per image-pair pointmaps each expressed in its local coordinate frame, Spann3R can predict per-image pointmaps expressed in a global coordinate system, thus eliminating the need for optimization-based global alignment. The key idea of Spann3R is to manage an external spatial memory that learns to keep track of all previous relevant 3D information. Spann3R then queries this spatial memory to predict the 3D structure of the next frame in a global coordinate system. Taking advantage of DUSt3R's pre-trained weights, and further fine-tuning on a subset of datasets, Spann3R shows competitive performance and generalization ability on various unseen datasets and can process ordered image collections in real time. Project page: \url{https://hengyiwang.github.io/projects/spanner}

View on arXiv PDF

Similar