CVMay 25

Stabilizing Streaming Video Geometry via Dynamic Feature Normalization

arXiv:2605.2530894.6
Predicted impact top 12% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For applications requiring temporally consistent 3D geometry from video (e.g., autonomous driving, embodied AI), this work provides a practical solution to a known bottleneck (temporal inconsistency) with minimal overhead.

The paper identifies scale-shift drifting in streaming video geometry estimation as caused by fluctuations in latent feature statistics, and proposes Dynamic Feature Normalization (DyFN), a lightweight recurrent module that stabilizes these statistics. DyFN achieves state-of-the-art temporal stability, improving over prior streaming methods by up to 14% while adding only 2% parameters.

Consistent 3D geometry estimation from streaming RGB input is crucial for real-world applications such as autonomous driving, embodied AI, and large-scale reconstruction. While modern monocular geometry foundation models achieve strong single-image accuracy, they exhibit severe temporal inconsistency on continuous input, notably dominated by scale--shift drifting. Through targeted empirical analysis, we trace this instability to its root cause: fluctuations in latent feature statistics, whose mean and variance directly determine the predicted depth's scale and shift. Building on this insight, we introduce Dynamic Feature Normalization (DyFN), a lightweight, causal recurrent module that dynamically and robustly modulates feature statistics to maintain stable geometry over time. We adapt powerful pretrained monocular geometry models for streaming by finetuning only DyFN, a mere 2\% additional parameters, while keeping the backbone frozen, thereby achieving temporal consistency without compromising single-image accuracy. Extensive experiments across four benchmarks show that DyFN effectively eliminates temporal artifacts such as disjointed layering and positional jitter, and achieves state-of-the-art temporal stability, improving over prior streaming methods by up to 14\% and even outperforming heavier non-causal video baselines. Project Page: https://shawlyu.github.io/DyFN

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes