Towards foundation-style models for energy-frontier heterogeneous neutrino detectors via self-supervised pre-training

Saúl Alonso-Monsalve, Fabio Cufino, Umut Kose, Anna Mascellani, André Rubbia

arXiv:2604.070379.1

Predicted impact top 86% in HEP-EX · last 90 daysOriginality Incremental advance

AI Analysis

This addresses the problem of scalable event interpretation for neutrino and particle-detector analysis, particularly in energy-frontier regimes, though it appears incremental as it builds on existing self-supervised and ViT methods.

The paper tackles the challenge of interpreting dense, overlapping detector signatures in energy-frontier neutrino physics where labeled data is scarce, by developing a sparse ViT framework with self-supervised pre-training. The result shows consistent improvements in tasks like neutrino flavor identification and momentum regression, with pre-training matching the performance of models trained on 10 times more data using only about 1,000 labeled events.

Accelerator-based neutrino physics is entering an energy-frontier regime in which interactions reach the TeV scale and produce exceptionally dense, overlapping detector signatures. In this regime, event interpretation becomes impractical for conventional reconstruction approaches, particularly when labelled data are scarce and the analysis spans diverse downstream objectives. We present a sparse ViT framework for learning reusable representations from heterogeneous detector data. Self-supervised pre-training combines masked autoencoder reconstruction with relational voxel-level objectives for hierarchy, ghost and particle identification, and the resulting shared encoder is then jointly fine-tuned across classification and regression tasks. Evaluated on simulated events from the proposed FASERCal concept at the LHC, we find that pre-training consistently improves neutrino flavour and charm-quark identification, momentum regression, and vertex reconstruction over training from scratch, with the addition of relational objectives yielding further gains in the most topologically complex channels. Interpretability analyses further show that pre-training yields a more structured latent space, while detector-subsystem ablations recover physically plausible channel-dependent roles for the heterogeneous inputs. A data-efficiency study shows that, with roughly $10^3$ labelled events, the pre-trained encoder already matches the flavour-classification performance of a randomly initialised model trained on an order of magnitude more data. The learned representations also transfer effectively to publicly available benchmarks spanning different detector technologies and energy scales, matching or exceeding published baselines. These results support self-supervised pre-training on multimodal detector data as a scalable route towards reusable representations for neutrino and particle-detector analysis.

View on arXiv PDF

Similar