ROAIMar 4

VANGUARD: Vehicle-Anchored Ground Sample Distance Estimation for UAVs in GPS-Denied Environments

arXiv:2603.04277v1h-index: 8
Originality Highly original
AI Analysis

This work is significant for improving the safety and reliability of autonomous aerial robots operating in GPS-denied environments by providing a robust method for metric scale recovery, which is crucial for LLM/VLM-based planners.

This paper addresses the problem of estimating absolute metric scale for UAVs in GPS-denied environments, where state-of-the-art VLMs exhibit over 50% median area estimation errors. The authors propose VANGUARD, a geometric perception tool that estimates Ground Sample Distance (GSD) from detected vehicles, achieving a 6.87% median GSD error on the DOTA v1.5 benchmark and reducing area measurement errors to 19.7% with 4x fewer catastrophic failures compared to the best VLM baseline.

Autonomous aerial robots operating in GPS-denied or communication-degraded environments frequently lose access to camera metadata and telemetry, leaving onboard perception systems unable to recover the absolute metric scale of the scene. As LLM/VLM-based planners are increasingly adopted as high-level agents for embodied systems, their ability to reason about physical dimensions becomes safety-critical -- yet our experiments show that five state-of-the-art VLMs suffer from spatial scale hallucinations, with median area estimation errors exceeding 50%. We propose VANGUARD, a lightweight, deterministic Geometric Perception Skill designed as a callable tool that any LLM-based agent can invoke to recover Ground Sample Distance (GSD) from ubiquitous environmental anchors: small vehicles detected via oriented bounding boxes, whose modal pixel length is robustly estimated through kernel density estimation and converted to GSD using a pre-calibrated reference length. The tool returns both a GSD estimate and a composite confidence score, enabling the calling agent to autonomously decide whether to trust the measurement or fall back to alternative strategies. On the DOTA~v1.5 benchmark, VANGUARD achieves 6.87% median GSD error on 306~images. Integrated with SAM-based segmentation for downstream area measurement, the pipeline yields 19.7% median error on a 100-entry benchmark -- with 2.6x lower category dependence and 4x fewer catastrophic failures than the best VLM baseline -- demonstrating that equipping agents with deterministic geometric tools is essential for safe autonomous spatial reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes