DCCVMay 9

Transforming the Use of Earth Observation Data: Exascale Training of a Generative Compression Model with Historical Priors for up to 10,000x Data Reduction

arXiv:2605.0863342.8
Predicted impact top 37% in DC · last 90 daysOriginality Highly original
AI Analysis

This work addresses the growing data bottleneck in Earth observation by enabling extreme compression that transforms data from passive storage to an active, task-adaptive resource.

The authors propose a generative compression framework for Earth observation data that leverages historical priors to achieve 100x to 10,000x data reduction across downstream tasks. They demonstrate exascale training on a CPU supercomputer, sustaining 1.54 EFLOP/s and peaking at 2.16 EFLOP/s.

Earth observation is becoming one of the largest data-producing activities in science, yet current pipelines still treat compression as a storage and transmission tool rather than a new way to use data. We present a generative compression framework that learns from historical Earth observation archives and enables on-demand 100x to 10,000x data reduction across downstream tasks. Unlike general visual data, Earth observation repeatedly measures the same evolving planet, making historical-prior learning feasible for extreme compression. To realize this paradigm, we train large generative compression models at exascale on the LineShine Armv9 CPU supercomputer, with co-optimization across model design, kernels, memory hierarchy, runtime, and parallelism. Our implementation sustains 1.54 EFLOP/s and peaks at 2.16 EFLOP/s in end-to-end training. This work shows that historical-prior generative compression can turn Earth observation data into an active, task-adaptive foundation for acquisition, delivery, storage, and scientific use.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes