CVMar 6

OVGGT: O(1) Constant-Cost Streaming Visual Geometry Transformer

arXiv:2603.05959v1h-index: 10
Predicted impact top 41% in CV · last 90 daysOriginality Highly original
AI Analysis

This work solves the problem of unbounded memory and compute growth in streaming 3D geometry reconstruction for long-horizon applications, which is a critical limitation for existing causal-attention models.

This paper addresses the challenge of 3D geometry reconstruction from streaming video by introducing OVGGT, a training-free framework that maintains constant memory and compute costs regardless of video length. It achieves this by combining Self-Selective Caching for KV cache compression and Dynamic Anchor Protection to prevent geometric drift, enabling state-of-the-art 3D geometric accuracy on arbitrarily long videos.

Reconstructing 3D geometry from streaming video requires continuous inference under bounded resources. Recent geometric foundation models achieve impressive reconstruction quality through all-to-all attention, yet their quadratic cost confines them to short, offline sequences. Causal-attention variants such as StreamVGGT enable single-pass streaming but accumulate an ever-growing KV cache, exhausting GPU memory within hundreds of frames and precluding the long-horizon deployment that motivates streaming inference in the first place. We present OVGGT, a training-free framework that bounds both memory and compute to a fixed budget regardless of sequence length. Our approach combines Self-Selective Caching, which leverages FFN residual magnitudes to compress the KV cache while remaining fully compatible with FlashAttention, with Dynamic Anchor Protection, which shields coordinate-critical tokens from eviction to suppress geometric drift over extended trajectories. Extensive experiments on indoor, outdoor, and ultra-long sequence benchmarks demonstrate that OVGGT processes arbitrarily long videos within a constant VRAM envelope while achieving state-of-the-art 3D geometric accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes