AIMay 23

GRAIL: AI translation for scientists application workflow on satellite data

arXiv:2605.247846.5
AI Analysis

For domain scientists analyzing satellite imagery, GRAIL provides a scalable solution without requiring expertise in distributed computing, though the approach is incremental.

GRAIL translates Python geospatial workflows into Spark-based programs, enabling scalable satellite data analysis without requiring scientists to learn new frameworks. The system achieves correct and scalable translations on real-world workflows.

Domain scientists increasingly develop Python scripts to analyze satellite imagery but they lack scalability to large-scale data. This paper demonstrates GRAIL, an agentic translation system that converts Python geospatial workflows into executable Spark-based programs without requiring scientists to learn a new framework. Rather than fine-tuning a specialized LLM model, GRAIL adapts RDPro, a Scala library for satellite data analysis, to make it LLM-ready using structured documentation, API alias functions, and repair-oriented error logs. Translation is structured as a LangGraph pipeline that decomposes code generation into explicit sections with guided inputs and outputs, enabling targeted repair without regenerating the full program. We demonstrate GRAIL on real-world geospatial workflows and showcase the correctness and scalability of the translated code.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes