AIApr 16, 2025

Towards LLM Agents for Earth Observation

Chia Hsiang Kao, Wenting Zhao, Shreelekha Revankar, Samuel Speas, Snehal Bhagat, Rajeev Datta, Cheng Perng Phoo, Utkarsh Mall, Carl Vondrick, Kavita Bala, Bharath Hariharan

arXiv:2504.12110v213.610 citationsh-index: 32

Originality Incremental advance

AI Analysis

This addresses the challenge of automating Earth Observation for environmental monitoring and disaster management, but it is incremental as it focuses on improving failure rates rather than a breakthrough.

The paper tackled the problem of AI systems' reliability for Earth Observation by introducing a benchmark of 140 yes/no questions, finding that LLM agents achieved only 33% accuracy due to a 58% code failure rate, and improved this by fine-tuning synthetic data to allow smaller models to match larger ones.

Earth Observation (EO) provides critical planetary data for environmental monitoring, disaster management, climate science, and other scientific domains. Here we ask: Are AI systems ready for reliable Earth Observation? We introduce \datasetnamenospace, a benchmark of 140 yes/no questions from NASA Earth Observatory articles across 13 topics and 17 satellite sensors. Using Google Earth Engine API as a tool, LLM agents can only achieve an accuracy of 33% because the code fails to run over 58% of the time. We improve the failure rate for open models by fine-tuning synthetic data, allowing much smaller models (Llama-3.1-8B) to achieve comparable accuracy to much larger ones (e.g., DeepSeek-R1). Taken together, our findings identify significant challenges to be solved before AI agents can automate earth observation, and suggest paths forward. The project page is available at https://iandrover.github.io/UnivEarth.

View on arXiv PDF

Similar