AIMay 19

GeoX: Mastering Geospatial Reasoning Through Self-Play and Verifiable Rewards

arXiv:2605.2000662.5
AI Analysis

For researchers in geospatial AI, GeoX offers a scalable method to develop spatial reasoning in VLMs, reducing reliance on expensive human annotations.

GeoX introduces a self-play framework for geospatial reasoning that uses executable programs and verifiable rewards to improve vision-language models without large-scale human annotation, achieving up to 5.5 points average improvement over base models and matching baselines trained on millions of curated data.

Geospatial reasoning requires solving image-grounded problems over the complex spatial structure of a scene. However, developing this capability is hindered by the cost of annotating a vast and combinatorial question space. We propose GeoX, a self-play framework that acquires spatial logic through executable programs that yield verifiable rewards, without relying on large-scale human-curated data Given a satellite or aerial image, our framework employs a single multimodal policy that proposes spatial problems as executable programs and solves them under three reasoning modes-abduction, deduction, and induction-over spatial primitives and an image understanding tool. A verifier executes each program to covert a reward signal that jointly optimizes the two roles via reinforcement learning. GeoX consistently improves its base VLMs by up to 5.5 points on average, matching or exceeding conventional baselines trained on millions of curated data. Along-side the proposed method, we release a benchmark for geospatial understanding accumulated through self-play.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes