Multimodal LLM Guided Exploration and Active Mapping using Fisher Information
This addresses the problem of efficient and uncertainty-aware exploration for embodied agents, representing an incremental improvement by integrating existing methods.
The paper tackles active mapping by combining multimodal LLMs for long-horizon planning with an information-based objective for motion planning, achieving state-of-the-art results on Gibson and Habitat-Matterport 3D datasets.
We present an active mapping system that plans for both long-horizon exploration goals and short-term actions using a 3D Gaussian Splatting (3DGS) representation. Existing methods either do not take advantage of recent developments in multimodal Large Language Models (LLM) or do not consider challenges in localization uncertainty, which is critical in embodied agents. We propose employing multimodal LLMs for long-horizon planning in conjunction with detailed motion planning using our information-based objective. By leveraging high-quality view synthesis from our 3DGS representation, our method employs a multimodal LLM as a zero-shot planner for long-horizon exploration goals from the semantic perspective. We also introduce an uncertainty-aware path proposal and selection algorithm that balances the dual objectives of maximizing the information gain for the environment while minimizing the cost of localization errors. Experiments conducted on the Gibson and Habitat-Matterport 3D datasets demonstrate state-of-the-art results of the proposed method.