TerraFlow: Multimodal, Multitemporal Representation Learning for Earth Observation
This addresses the challenge of robust sequence-aware learning for Earth observation data, offering incremental improvements over existing foundation models.
The paper tackled the problem of multimodal, multitemporal learning for Earth observation by proposing TerraFlow, which outperformed state-of-the-art foundation models by up to 50% in F1 score and 24% in Brier score on tasks like natural disaster risk map prediction.
We propose TerraFlow, a novel approach to multimodal, multitemporal learning for Earth observation. TerraFlow builds on temporal training objectives that enable sequence-aware learning across space, time, and modality, while remaining robust to the variable-length inputs commonly encountered in real-world Earth observation data. Our experiments demonstrate superiority of TerraFlow over state-of-the-art foundation models for Earth observation across all temporal tasks of the GEO-Bench-2 benchmark. We additionally demonstrate that TerraFlow is able to make initial steps towards deep-learning based risk map prediction for natural disasters -- a task on which other state-of-the-art foundation models frequently collapse. TerraFlow outperforms state-of-the-art foundation models by up to 50% in F1 score and 24% in Brier score.