AIApr 16, 2025

Towards LLM Agents for Earth Observation

arXiv:2504.12110v27 citationsh-index: 21
Originality Incremental advance
AI Analysis

This addresses the challenge of automating Earth Observation for environmental monitoring and disaster management, but it is incremental as it focuses on improving failure rates rather than a breakthrough.

The paper tackled the problem of AI systems' reliability for Earth Observation by introducing a benchmark of 140 yes/no questions, finding that LLM agents achieved only 33% accuracy due to a 58% code failure rate, and improved this by fine-tuning synthetic data to allow smaller models to match larger ones.

Earth Observation (EO) provides critical planetary data for environmental monitoring, disaster management, climate science, and other scientific domains. Here we ask: Are AI systems ready for reliable Earth Observation? We introduce \datasetnamenospace, a benchmark of 140 yes/no questions from NASA Earth Observatory articles across 13 topics and 17 satellite sensors. Using Google Earth Engine API as a tool, LLM agents can only achieve an accuracy of 33% because the code fails to run over 58% of the time. We improve the failure rate for open models by fine-tuning synthetic data, allowing much smaller models (Llama-3.1-8B) to achieve comparable accuracy to much larger ones (e.g., DeepSeek-R1). Taken together, our findings identify significant challenges to be solved before AI agents can automate earth observation, and suggest paths forward. The project page is available at https://iandrover.github.io/UnivEarth.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes