Particle identification with machine learning from incomplete data in the ALICE experiment
This improves particle identification for high-energy physics experiments, but is incremental as it builds on existing ML methods for a known bottleneck.
The paper tackles particle identification in the ALICE experiment by developing a machine learning solution using neural networks with Feature Set Embedding and attention to handle incomplete data, achieving better performance than traditional rectangular cuts.
The ALICE experiment at the LHC measures properties of the strongly interacting matter formed in ultrarelativistic heavy-ion collisions. Such studies require accurate particle identification (PID). ALICE provides PID information via several detectors for particles with momentum from about 100 MeV/c up to 20 GeV/c. Traditionally, particles are selected with rectangular cuts. A much better performance can be achieved with machine learning (ML) methods. Our solution uses multiple neural networks (NN) serving as binary classifiers. Moreover, we extended our particle classifier with Feature Set Embedding and attention in order to train on data with incomplete samples. We also present the integration of the ML project with the ALICE analysis software, and we discuss domain adaptation, the ML technique needed to transfer the knowledge between simulated and real experimental data.