HEP-PH LG HEP-EX DATA-ANOct 20, 2022

Machine-Learning Compression for Particle Physics Discoveries

Jack H. Collins, Yifeng Huang, Simon Knapen, Benjamin Nachman, Daniel Whiteson

arXiv:2210.11489v24.37 citationsh-index: 92

Originality Incremental advance

AI Analysis

This addresses data storage challenges for particle physics experiments, offering a novel method to compress events for broader analysis, though it is incremental as it builds on existing VAE and compression paradigms.

The paper tackles the problem of extreme data rates in particle physics experiments by proposing a compression strategy using a β-VAE to compress entire events for generic offline analysis at lower fidelity, showing in a di-muon resonance search at the LHC that compressed data can distinguish distinct signal morphologies.

In collider-based particle and nuclear physics experiments, data are produced at such extreme rates that only a subset can be recorded for later analysis. Typically, algorithms select individual collision events for preservation and store the complete experimental response. A relatively new alternative strategy is to additionally save a partial record for a larger subset of events, allowing for later specific analysis of a larger fraction of events. We propose a strategy that bridges these paradigms by compressing entire events for generic offline analysis but at a lower fidelity. An optimal-transport-based $β$ Variational Autoencoder (VAE) is used to automate the compression and the hyperparameter $β$ controls the compression fidelity. We introduce a new approach for multi-objective learning functions by simultaneously learning a VAE appropriate for all values of $β$ through parameterization. We present an example use case, a di-muon resonance search at the Large Hadron Collider (LHC), where we show that simulated data compressed by our $β$-VAE has enough fidelity to distinguish distinct signal morphologies.

View on arXiv PDF

Similar