LGDCApr 20

Preserving Clusters in Error-Bounded Lossy Compression of Particle Data

arXiv:2604.1880114.4h-index: 35
AI Analysis

For scientists using large-scale particle simulations, this work enables lossy compression without sacrificing the validity of clustering-based downstream analysis, addressing a key bottleneck in scientific data management.

Existing lossy compressors for particle data provide only pointwise error bounds and cannot guarantee preservation of clustering structures (e.g., single-linkage clustering), which are critical for scientific analysis. The authors propose a correction-based technique that, when applied to decompressed data from compressors like SZ3 and Draco, preserves clustering outcomes while maintaining competitive compression ratios, as demonstrated on cosmology and molecular dynamics datasets.

Lossy compression is widely used to reduce storage and I/O costs for large-scale particle datasets in scientific applications such as cosmology, molecular dynamics, and fluid dynamics, where clustering structures (e.g., single-linkage or Friends-of-Friends) are critical for downstream analysis; however, existing compressors typically provide only pointwise error bounds on particle positions and offer no guarantees on preserving clustering outcomes, and even small perturbations can alter cluster connectivity and compromise scientific validity. We propose a correction-based technique to preserve single-linkage clustering under lossy compression, operating on decompressed data from off-the-shelf compressors such as SZ3 and Draco. Our key contributions are threefold: (1) a clustering-aware correction algorithm that identifies vulnerable particle pairs via spatial partitioning and local neighborhood search; (2) an optimization-based formulation that enforces clustering consistency using projected gradient descent with a loss that encodes pairwise distance violations; and (3) a scalable GPU-accelerated and distributed implementation for large-scale datasets. Experiments on cosmology and molecular dynamics datasets show that our method effectively preserves clustering results while maintaining competitive compression performance compared with SZ3, ZFP, Draco, LCP, and space-filling-curve-based schemes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes