AIDec 2, 2024

Efficient Compression of Sparse Accelerator Data Using Implicit Neural Representations and Importance Sampling

arXiv:2412.01754v11 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses the critical need for real-time, high-throughput data compression in nuclear and high-energy physics experiments, enabling manageable storage of massive datasets.

The paper tackles the problem of compressing extremely sparse particle collider data, which can reach petabytes per second, by using implicit neural representations and importance sampling, achieving competitive compression with traditional methods while offering significant speed-ups and negligible accuracy loss.

High-energy, large-scale particle colliders in nuclear and high-energy physics generate data at extraordinary rates, reaching up to $1$ terabyte and several petabytes per second, respectively. The development of real-time, high-throughput data compression algorithms capable of reducing this data to manageable sizes for permanent storage is of paramount importance. A unique characteristic of the tracking detector data is the extreme sparsity of particle trajectories in space, with an occupancy rate ranging from approximately $10^{-6}$ to $10\%$. Furthermore, for downstream tasks, a continuous representation of this data is often more useful than a voxel-based, discrete representation due to the inherently continuous nature of the signals involved. To address these challenges, we propose a novel approach using implicit neural representations for data learning and compression. We also introduce an importance sampling technique to accelerate the network training process. Our method is competitive with traditional compression algorithms, such as MGARD, SZ, and ZFP, while offering significant speed-ups and maintaining negligible accuracy loss through our importance sampling strategy.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes