HEP-EXDec 9, 2022
FAIR AI Models in High Energy PhysicsJavier Duarte, Haoyang Li, Avik Roy et al.
The findable, accessible, interoperable, and reusable (FAIR) data principles provide a framework for examining, evaluating, and improving how data is shared to facilitate scientific discovery. Generalizing these principles to research software and other digital products is an active area of research. Machine learning (ML) models -- algorithms that have been trained on data without being explicitly programmed -- and more generally, artificial intelligence (AI) models, are an important target for this because of the ever-increasing pace with which AI is transforming scientific domains, such as experimental high energy physics (HEP). In this paper, we propose a practical definition of FAIR principles for AI models in HEP and describe a template for the application of these principles. We demonstrate the template's use with an example AI model applied to HEP, in which a graph neural network is used to identify Higgs bosons decaying to two bottom quarks. We report on the robustness of this FAIR AI model, its portability across hardware architectures and software frameworks, and its interpretability.
HEP-EXSep 12, 2023
Electron Energy Regression in the CMS High-Granularity Calorimeter PrototypeRoger Rusack, Bhargav Joshi, Alpana Alpana et al.
We present a new publicly available dataset that contains simulated data of a novel calorimeter to be installed at the CERN Large Hadron Collider. This detector will have more than six-million channels with each channel capable of position, ionisation and precision time measurement. Reconstructing these events in an efficient way poses an immense challenge which is being addressed with the latest machine learning techniques. As part of this development a large prototype with 12,000 channels was built and a beam of high-energy electrons incident on it. Using machine learning methods we have reconstructed the energy of incident electrons from the energies of three-dimensional hits, which is known to some precision. By releasing this data publicly we hope to encourage experts in the application of machine learning to develop efficient and accurate image reconstruction of these electrons.
INS-DETMar 23, 2023
Predicting the Future of the CMS Detector: Crystal Radiation Damage and Machine Learning at the LHCBhargav Joshi, Taihui Li, Buyun Liang et al.
The 75,848 lead tungstate crystals in CMS experiment at the CERN Large Hadron Collider are used to measure the energy of electrons and photons produced in the proton-proton collisions. The optical transparency of the crystals degrades slowly with radiation dose due to the beam-beam collisions. The transparency of each crystal is monitored with a laser monitoring system that tracks changes in the optical properties of the crystals due to radiation from the collision products. Predicting the optical transparency of the crystals, both in the short-term and in the long-term, is a critical task for the CMS experiment. We describe here the public data release, following FAIR principles, of the crystal monitoring data collected by the CMS Collaboration between 2016 and 2018. Besides describing the dataset and its access, the problems that can be addressed with it are described, as well as an example solution based on a Long Short-Term Memory neural network developed to predict future behavior of the crystals.
HEP-EXAug 4, 2021
A FAIR and AI-ready Higgs boson decay datasetYifan Chen, E. A. Huerta, Javier Duarte et al.
To enable the reusability of massive scientific datasets by humans and machines, researchers aim to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, step-by-step assessment guide to evaluate whether or not a given dataset meets these principles. We demonstrate how to use this guide to evaluate the FAIRness of an open simulated dataset produced by the CMS Collaboration at the CERN Large Hadron Collider. This dataset consists of Higgs boson decays and quark and gluon background, and is available through the CERN Open Data Portal. We use additional available tools to assess the FAIRness of this dataset, and incorporate feedback from members of the FAIR community to validate our results. This article is accompanied by a Jupyter notebook to visualize and explore this dataset. This study marks the first in a planned series of articles that will guide scientists in the creation of FAIR AI models and datasets in high energy particle physics.