ROAICVAug 30, 2025

Embodied Spatial Intelligence: from Implicit Scene Modeling to Spatial Reasoning

arXiv:2509.00465v1
Originality Incremental advance
AI Analysis

It addresses the problem of bridging LLMs with physical embodiment for robotics, which is incremental as it builds on existing methods in scene modeling and reasoning.

This thesis tackled the challenge of enabling robots to perceive and act based on natural language instructions by developing robust implicit neural models for scene representation and enhancing LLMs for spatial reasoning, resulting in contributions like self-supervised camera calibration and a novel navigation benchmark.

This thesis introduces "Embodied Spatial Intelligence" to address the challenge of creating robots that can perceive and act in the real world based on natural language instructions. To bridge the gap between Large Language Models (LLMs) and physical embodiment, we present contributions on two fronts: scene representation and spatial reasoning. For perception, we develop robust, scalable, and accurate scene representations using implicit neural models, with contributions in self-supervised camera calibration, high-fidelity depth field generation, and large-scale reconstruction. For spatial reasoning, we enhance the spatial capabilities of LLMs by introducing a novel navigation benchmark, a method for grounding language in 3D, and a state-feedback mechanism to improve long-horizon decision-making. This work lays a foundation for robots that can robustly perceive their surroundings and intelligently act upon complex, language-based commands.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes