LGAICVHEP-EXAug 26, 2025

Fine-Tuning Vision-Language Models for Neutrino Event Analysis in High-Energy Physics Experiments

arXiv:2508.19376v1h-index: 15
Originality Synthesis-oriented
AI Analysis

This addresses the problem of improving event classification for researchers in high-energy physics experiments like NOvA and DUNE, but it is incremental as it applies an existing method to a new domain.

The paper tackled the problem of classifying neutrino interactions from pixelated detector images in high-energy physics experiments by fine-tuning a vision-language model based on LLaMA 3.2, and the result showed that it matches or exceeds the performance of an established CNN baseline in metrics like accuracy, precision, recall, and AUC-ROC.

Recent progress in large language models (LLMs) has shown strong potential for multimodal reasoning beyond natural language. In this work, we explore the use of a fine-tuned Vision-Language Model (VLM), based on LLaMA 3.2, for classifying neutrino interactions from pixelated detector images in high-energy physics (HEP) experiments. We benchmark its performance against an established CNN baseline used in experiments like NOvA and DUNE, evaluating metrics such as classification accuracy, precision, recall, and AUC-ROC. Our results show that the VLM not only matches or exceeds CNN performance but also enables richer reasoning and better integration of auxiliary textual or semantic context. These findings suggest that VLMs offer a promising general-purpose backbone for event classification in HEP, paving the way for multimodal approaches in experimental neutrino physics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes