ColliderML: The First Release of an OpenDataDetector High-Luminosity Physics Benchmark Dataset

arXiv:2512.15230v12 citationsh-index: 30
Originality Synthesis-oriented
AI Analysis

This fills a major gap for machine learning researchers working on collider physics by providing standardized, accessible detector-level data.

The authors introduced ColliderML, a large open dataset of simulated proton-proton collisions for High-Luminosity LHC conditions, providing one million events across various physics processes to address the lack of detector-level data for machine learning research in collider physics.

We introduce ColliderML - a large, open, experiment-agnostic dataset of fully simulated and digitised proton-proton collisions in High-Luminosity Large Hadron Collider conditions ($\sqrt{s}=14$ TeV, mean pile-up $μ= 200$). ColliderML provides one million events across ten Standard Model and Beyond Standard Model processes, plus extensive single-particle samples, all produced with modern next-to-leading order matrix element calculation and showering, realistic per-event pile-up overlay, a validated OpenDataDetector geometry, and standard reconstructions. The release fills a major gap for machine learning (ML) research on detector-level data, provided on the ML-friendly Hugging Face platform. We present physics coverage and the generation, simulation, digitisation and reconstruction pipeline, describe format and access, and initial collider physics benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes