AIMar 16

RS-WorldModel: a Unified Model for Remote Sensing Understanding and Future Sense Forecasting

arXiv:2603.1494194.9h-index: 3Has Code
AI Analysis

This work addresses the need for integrated remote sensing analysis and forecasting, offering a novel approach that could benefit environmental monitoring and planning applications, though it appears incremental in combining existing tasks.

The paper tackles the problem of remote sensing world modeling by developing RS-WorldModel, a unified model that jointly handles spatiotemporal change understanding and text-guided future scene forecasting, achieving an FID of 43.13 in forecasting and surpassing open-source models up to 120 times larger on change understanding metrics with only 2B parameters.

Remote sensing world models aim to both explain observed changes and forecast plausible futures, two tasks that share spatiotemporal priors. Existing methods, however, typically address them separately, limiting cross-task transfer. We present RS-WorldModel, a unified world model for remote sensing that jointly handles spatiotemporal change understanding and text-guided future scene forecasting, and we build RSWBench-1.1M, a 1.1 million sample dataset with rich language annotations covering both tasks. RS-WorldModel is trained in three stages: (1) Geo-Aware Generative Pre-training (GAGP) conditions forecasting on geographic and acquisition metadata; (2) synergistic instruction tuning (SIT) jointly trains understanding and forecasting; (3) verifiable reinforcement optimization (VRO) refines outputs with verifiable, task-specific rewards. With only 2B parameters, RS-WorldModel surpasses open-source models up to 120$ \times $ larger on most spatiotemporal change question-answering metrics. It achieves an FID of 43.13 on text-guided future scene forecasting, outperforming all open-source baselines as well as the closed-source Gemini-2.5-Flash Image (Nano Banana).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes