CVAIJan 21

SpatialMem: Unified 3D Memory with Metric Anchoring and Fast Retrieval

arXiv:2601.14895v11 citationsh-index: 6
Originality Highly original
AI Analysis

This addresses the problem of embodied spatial intelligence for robotics or AR/VR applications, representing an incremental/hybrid approach.

The researchers tackled the problem of creating a unified 3D memory system from casual RGB video for indoor environments, resulting in SpatialMem which maintains strong navigation completion and retrieval accuracy across real-life scenes under clutter and occlusion.

We present SpatialMem, a memory-centric system that unifies 3D geometry, semantics, and language into a single, queryable representation. Starting from casually captured egocentric RGB video, SpatialMem reconstructs metrically scaled indoor environments, detects structural 3D anchors (walls, doors, windows) as the first-layer scaffold, and populates a hierarchical memory with open-vocabulary object nodes -- linking evidence patches, visual embeddings, and two-layer textual descriptions to 3D coordinates -- for compact storage and fast retrieval. This design enables interpretable reasoning over spatial relations (e.g., distance, direction, visibility) and supports downstream tasks such as language-guided navigation and object retrieval without specialized sensors. Experiments across three real-life indoor scenes demonstrate that SpatialMem maintains strong anchor-description-level navigation completion and hierarchical retrieval accuracy under increasing clutter and occlusion, offering an efficient and extensible framework for embodied spatial intelligence.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes