CL AI LGApr 23, 2024

Evaluating Tool-Augmented Agents in Remote Sensing Platforms

Simranjit Singh, Michael Fore, Dimitrios Stamoulis

arXiv:2405.00709v19.115 citationsh-index: 7

Originality Synthesis-oriented

AI Analysis

This work addresses the need for more realistic evaluation of agents in remote sensing applications, though it is incremental as it focuses on benchmarking rather than a new method.

The paper tackles the problem of evaluating tool-augmented LLMs in remote sensing by addressing the gap between existing benchmarks and realistic user-grounded tasks, resulting in the creation of the GeoLLM-QA benchmark with insights from evaluating state-of-the-art LLMs on 1,000 tasks.

Tool-augmented Large Language Models (LLMs) have shown impressive capabilities in remote sensing (RS) applications. However, existing benchmarks assume question-answering input templates over predefined image-text data pairs. These standalone instructions neglect the intricacies of realistic user-grounded tasks. Consider a geospatial analyst: they zoom in a map area, they draw a region over which to collect satellite imagery, and they succinctly ask "Detect all objects here". Where is `here`, if it is not explicitly hardcoded in the image-text template, but instead is implied by the system state, e.g., the live map positioning? To bridge this gap, we present GeoLLM-QA, a benchmark designed to capture long sequences of verbal, visual, and click-based actions on a real UI platform. Through in-depth evaluation of state-of-the-art LLMs over a diverse set of 1,000 tasks, we offer insights towards stronger agents for RS applications.

View on arXiv PDF

Similar