CVOct 27, 2025

Quantizing Space and Time: Fusing Time Series and Images for Earth Observation

arXiv:2510.23118v3h-index: 7
Originality Incremental advance
AI Analysis

This work addresses multimodal fusion challenges in Earth observation, offering incremental advances with practical applications in generating global temperature profiles from satellite data.

The paper tackles the problem of fusing time series and single timestamp images for Earth observation by proposing a task-agnostic framework that enables cross-modal generation and robust downstream performance, achieving average improvements of 6% in R^2 and 2% in RMSE over task-specific fusion and 50% in R^2 and 12% in RMSE over baseline methods.

We propose a task-agnostic framework for multimodal fusion of time series and single timestamp images, enabling cross-modal generation and robust downstream performance. Our approach explores deterministic and learned strategies for time series quantization and then leverages a masked correlation learning objective, aligning discrete image and time series tokens in a unified representation space. Instantiated in the Earth observation domain, the pretrained model generates consistent global temperature profiles from satellite imagery and is validated through counterfactual experiments. Across downstream tasks, our task-agnostic pretraining outperforms task-specific fusion by 6% in R^2 and 2% in RMSE on average, and exceeds baseline methods by 50% in R^2 and 12% in RMSE. Finally, we analyze gradient sensitivity across modalities, providing insights into model robustness. Code, data, and weights will be released under a permissive license.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes