AIMar 7

Self-Supervised Multi-Modal World Model with 4D Space-Time Embedding

Lance Legel, Qin Huang, Brandon Voelker, Daniel Neamati, Patrick Alan Johnson, Favyen Bastani, Jeff Rose, James Ryan Hennessy, Robert Guralnick, Douglas Soltis, Pamela Soltis, Shaowen Wang

arXiv:2603.07039v113.7h-index: 16Has Code

Predicted impact top 39% in AI · last 90 daysOriginality Highly original

AI Analysis

This work provides a new method for planetary-scale spatio-temporal modeling, which is significant for researchers and practitioners in ecological forecasting and other Earth science applications.

This paper introduces DeepEarth, a self-supervised multi-modal world model that uses Earth4D, a novel planetary-scale 4D space-time positional encoder. Earth4D extends 3D multi-resolution hash encoding to include time, enabling efficient scaling across the planet over centuries with sub-meter, sub-second precision, and achieves state-of-the-art performance on an ecological forecasting benchmark.

We present DeepEarth, a self-supervised multi-modal world model with Earth4D, a novel planetary-scale 4D space-time positional encoder. Earth4D extends 3D multi-resolution hash encoding to include time, efficiently scaling across the planet over centuries with sub-meter, sub-second precision. Multi-modal encoders (e.g. vision-language models) are fused with Earth4D embeddings and trained via masked reconstruction. We demonstrate Earth4D's expressive power by achieving state-of-the-art performance on an ecological forecasting benchmark. Earth4D with learnable hash probing surpasses a multi-modal foundation model pre-trained on substantially more data. Access open source code and download models at: https://github.com/legel/deepearth

View on arXiv PDF Code

Similar