AIMar 7

Self-Supervised Multi-Modal World Model with 4D Space-Time Embedding

arXiv:2603.07039v1Has Code
Predicted impact top 39% in AI · last 90 daysOriginality Highly original
AI Analysis

This work provides a new method for planetary-scale spatio-temporal modeling, which is significant for researchers and practitioners in ecological forecasting and other Earth science applications.

This paper introduces DeepEarth, a self-supervised multi-modal world model that uses Earth4D, a novel planetary-scale 4D space-time positional encoder. Earth4D extends 3D multi-resolution hash encoding to include time, enabling efficient scaling across the planet over centuries with sub-meter, sub-second precision, and achieves state-of-the-art performance on an ecological forecasting benchmark.

We present DeepEarth, a self-supervised multi-modal world model with Earth4D, a novel planetary-scale 4D space-time positional encoder. Earth4D extends 3D multi-resolution hash encoding to include time, efficiently scaling across the planet over centuries with sub-meter, sub-second precision. Multi-modal encoders (e.g. vision-language models) are fused with Earth4D embeddings and trained via masked reconstruction. We demonstrate Earth4D's expressive power by achieving state-of-the-art performance on an ecological forecasting benchmark. Earth4D with learnable hash probing surpasses a multi-modal foundation model pre-trained on substantially more data. Access open source code and download models at: https://github.com/legel/deepearth

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes