CVMay 15

GHOST: Geometry-Hierarchical Online Streaming Token Eviction for Efficient 3D Reconstruction

arXiv:2605.1585280.0Has Code
AI Analysis

For researchers and practitioners in 3D reconstruction from video, GHOST provides a training-free method to reduce memory and speed up inference without quality loss.

GHOST addresses the memory bottleneck in streaming 3D reconstruction from long monocular videos by using the model's own 3D geometry outputs to evict redundant tokens online, cutting KV cache by nearly half and achieving 1.75x faster inference while preserving reconstruction quality.

Streaming 3D reconstruction from long monocular video sequences requires maintaining a key-value (KV) cache that grows linearly with sequence length, creating a severe memory bottleneck. Existing approaches either truncate the cache to a fixed set of anchor frames, leading to reconstruction quality degradation, or rely on attention-score heuristics that are agnostic to 3D scene structure, failing to preserve geometrically valuable tokens. To address these problems, we present GHOST (Geometry-Hierarchical Online Streaming Token Eviction), a training-free KV cache management framework that exploits the model's own 3D geometry outputs to evict redundant tokens online. GHOST introduces three mutually reinforcing innovations: a hierarchical dual-level importance scoring scheme, a privilege mechanism that protects special tokens from eviction, and a cosine-similarity-guided layer-wise budget allocation. Experiments on various benchmarks show that GHOST preserves excellent reconstruction quality while cutting the KV cache by nearly half and delivering 1.75x faster inference compared to state-of-the-art methods. Our code is available at https://github.com/lokiniuniu/GHOST.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes