CVFeb 12

WorldTree: Towards 4D Dynamic Worlds from Monocular Video using Tree-Chains

arXiv:2602.11845v11 citationsh-index: 19Has Code
AI Analysis

This addresses the problem of efficient motion representation in dynamic reconstruction for computer vision applications, offering incremental improvements over existing methods.

The paper tackles dynamic reconstruction from monocular video by proposing WorldTree, a unified framework with Temporal Partition Tree and Spatial Ancestral Chains, achieving an 8.26% improvement in LPIPS on NVIDIA-LS and a 9.09% improvement in mLPIPS on DyCheck compared to the second-best method.

Dynamic reconstruction has achieved remarkable progress, but there remain challenges in monocular input for more practical applications. The prevailing works attempt to construct efficient motion representations, but lack a unified spatiotemporal decomposition framework, suffering from either holistic temporal optimization or coupled hierarchical spatial composition. To this end, we propose WorldTree, a unified framework comprising Temporal Partition Tree (TPT) that enables coarse-to-fine optimization based on the inheritance-based partition tree structure for hierarchical temporal decomposition, and Spatial Ancestral Chains (SAC) that recursively query ancestral hierarchical structure to provide complementary spatial dynamics while specializing motion representations across ancestral nodes. Experimental results on different datasets indicate that our proposed method achieves 8.26% improvement of LPIPS on NVIDIA-LS and 9.09% improvement of mLPIPS on DyCheck compared to the second-best method. Code: https://github.com/iCVTEAM/WorldTree.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes