CVAISep 5, 2025

WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool

arXiv:2509.05296v121 citationsh-index: 8Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of efficient and high-quality online 3D reconstruction for applications like robotics and AR/VR, representing an incremental improvement over existing methods.

The paper tackles the trade-off between reconstruction quality and real-time performance in online 3D reconstruction by introducing WinT3R, which achieves state-of-the-art results in quality, camera pose estimation, and speed across diverse datasets.

We present WinT3R, a feed-forward reconstruction model capable of online prediction of precise camera poses and high-quality point maps. Previous methods suffer from a trade-off between reconstruction quality and real-time performance. To address this, we first introduce a sliding window mechanism that ensures sufficient information exchange among frames within the window, thereby improving the quality of geometric predictions without large computation. In addition, we leverage a compact representation of cameras and maintain a global camera token pool, which enhances the reliability of camera pose estimation without sacrificing efficiency. These designs enable WinT3R to achieve state-of-the-art performance in terms of online reconstruction quality, camera pose estimation, and reconstruction speed, as validated by extensive experiments on diverse datasets. Code and model are publicly available at https://github.com/LiZizun/WinT3R.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes