LGJan 23Code
JetFormer: A Scalable and Efficient Transformer for Jet Tagging from Offline Analysis to FPGA TriggersRuoqing Zheng, Chang Sun, Qibin Liu et al.
We present JetFormer, a versatile and scalable encoder-only Transformer architecture for particle jet tagging at the Large Hadron Collider (LHC). Unlike prior approaches that are often tailored to specific deployment regimes, JetFormer is designed to operate effectively across the full spectrum of jet tagging scenarios, from high-accuracy offline analysis to ultra-low-latency online triggering. The model processes variable-length sets of particle features without relying on input of explicit pairwise interactions, yet achieves competitive or superior performance compared to state-of-the-art methods. On the large-scale JetClass dataset, a large-scale JetFormer matches the accuracy of the interaction-rich ParT model (within 0.7%) while using 37.4% fewer FLOPs, demonstrating its computational efficiency and strong generalization. On benchmark HLS4ML 150P datasets, JetFormer consistently outperforms existing models such as MLPs, Deep Sets, and Interaction Networks by 3-4% in accuracy. To bridge the gap to hardware deployment, we further introduce a hardware-aware optimization pipeline based on multi-objective hyperparameter search, yielding compact variants like JetFormer-tiny suitable for FPGA-based trigger systems with sub-microsecond latency requirements. Through structured pruning and quantization, we show that JetFormer can be aggressively compressed with minimal accuracy loss. By unifying high-performance modeling and deployability within a single architectural framework, JetFormer provides a practical pathway for deploying Transformer-based jet taggers in both offline and online environments at the LHC. Code is available at https://github.com/walkieq/JetFormer.
LGJun 1
QUIVER: Quantum-Informed Views for Enhanced Representations in Large ML ModelsAritra Bal, Michael Binder, Markus Klute et al.
Large machine learning models benefit substantially from multimodal inputs that provide a complementary view of the same example. We introduce QUIVER (QUantum-Informed Views for Enhanced Representations, a paradigm that enriches classical data-driven features with a quantum Fisher view: a geometrically motivated, basis-independent summary of higher-order correlations captured by a variational quantum circuit (VQC) trained to perform the same task. Unlike classical feature augmentation, the quantum Fisher information matrix encodes the intrinsic geometry of the learned quantum state manifold. While this feature map, motivated by quantum information theory, is ordinarily non-trivial to model classically, it can surface statistical structure that additional classical data or model capacity finds difficult to learn. This makes the quantum Fisher view a genuinely complementary modality rather than a redundant one. We demonstrate that QUIVER improves standard performance metrics on two benchmark datasets from very different fields: QM9 for predicting molecule properties, and JetClass for predicting jet flavor at the Large Hadron Collider (LHC). The core contribution, however, is domain-agnostic: the quantum Fisher view can be fused into a broad class of model architectures via targeted modifications to the base architecture, to incorporate information about the quantum geometry of the problem. These results demonstrate that quantum-geometric features, extracted from simulated variational circuits, can deliver measurable value for standard machine learning tasks, well before the advent of fault-tolerant quantum hardware.
HEP-EXJun 22, 2023
Triggering Dark Showers with Conditional Dual Auto-EncodersLuca Anzalone, Simranjit Singh Chhibra, Benedikt Maier et al.
We present a family of conditional dual auto-encoders (CoDAEs) for generic and model-independent new physics searches at colliders. New physics signals, which arise from new types of particles and interactions, are considered in our study as anomalies causing deviations in data with respect to expected background events. In this work, we perform a normal-only anomaly detection, which employs only background samples, to search for manifestations of a dark version of strong force applying (variational) auto-encoders on raw detector images, which are large and highly sparse, without leveraging any physics-based pre-processing or strong assumption on the signals. The proposed CoDAE has a dual-encoder design, which is general and can learn an auxiliary yet compact latent space through spatial conditioning, showing a neat improvement over competitive physics-based baselines and related approaches, therefore also reducing the gap with fully supervised models. It is the first time an unsupervised model is shown to exhibit excellent discrimination against multiple dark shower models, illustrating the suitability of this method as an accurate, fast, model-independent algorithm to deploy, e.g., in the real-time event triggering systems of Large Hadron Collider experiments such as ATLAS and CMS.
HEP-EXJun 23, 2023
Autoencoders for Real-Time SUEP DetectionSimranjit Singh Chhibra, Nadezda Chernyavskaya, Benedikt Maier et al.
Confining dark sectors with pseudo-conformal dynamics can produce Soft Unclustered Energy Patterns (SUEP), at the Large Hadron Collider: the production of dark quarks in proton-proton collisions leading to a dark shower and the high-multiplicity production of dark hadrons. The final experimental signature is spherically-symmetric energy deposits by an anomalously large number of soft Standard Model particles with a transverse energy of O(100) MeV. Assuming Yukawa-like couplings of the scalar portal state, the dominant production mode is gluon fusion, and the dominant background comes from multi-jet QCD events. We have developed a deep learning-based Anomaly Detection technique to reject QCD jets and identify any anomalous signature, including SUEP, in real-time in the High-Level Trigger system of the Compact Muon Solenoid experiment at the Large Hadron Collider. A deep convolutional neural autoencoder network has been trained using QCD events by taking transverse energy deposits in the inner tracker, electromagnetic calorimeter, and hadron calorimeter sub-detectors as 3-channel image data. Due to the sparse nature of the data, only ~0.5% of the total ~300 k image pixels have non-zero values. To tackle this challenge, a non-standard loss function, the inverse of the so-called Dice Loss, is exploited. The trained autoencoder with learned spatial features of QCD jets can detect 40% of the SUEP events, with a QCD event mistagging rate as low as 2%. The model inference time has been measured using the Intel CoreTM i5-9600KF processor and found to be ~20 ms, which perfectly satisfies the High-Level Trigger system's latency of O(100) ms. Given the virtue of the unsupervised learning of the autoencoders, the trained model can be applied to any new physics model that predicts an experimental signature anomalous to QCD jets.
HEP-EXMar 24
Contrastive Metric Learning for Point Cloud Segmentation in Highly Granular DetectorsMax Marriott-Clarke, Lazar Novakovic, Elizabeth Ratzer et al.
We propose a novel clustering approach for point-cloud segmentation based on supervised contrastive metric learning (CML). Rather than predicting cluster assignments or object-centric variables, the method learns a latent representation in which points belonging to the same object are embedded nearby while unrelated points are separated. Clusters are then reconstructed using a density-based readout in the learned metric space, decoupling representation learning from cluster formation and enabling flexible inference. The approach is evaluated on simulated data from a highly granular calorimeter, where the task is to separate highly overlapping particle showers represented as sets of calorimeter hits. A direct comparison with object condensation (OC) is performed using identical graph neural network backbones and equal latent dimensionality, isolating the effect of the learning objective. The CML method produces a more stable and separable embedding geometry for both electromagnetic and hadronic particle showers, leading to improved local neighbourhood consistency, a more reliable separation of overlapping showers, and better generalization when extrapolating to unseen multiplicities and energies. This translates directly into higher reconstruction efficiency and purity, particularly in high-multiplicity regimes, as well as improved energy resolution. In mixed-particle environments, CML maintains strong performance, suggesting robust learning of the shower topology, while OC exhibits significant degradation. These results demonstrate that similarity-based representation learning combined with density-based aggregation is a promising alternative to object-centric approaches for point cloud segmentation in highly granular detectors.
HEP-PHMay 26
Particle-Lund Multimodality in Jet TaggersLoukas Gouskos, Benedikt Maier
The Lund plane offers a physics-motivated, hierarchical representation of QCD radiation within jets, while transformer-based taggers have reached state-of-the-art performance by learning directly from raw particle constituents and their pairwise relations. We investigate whether transformers implicitly capture hierarchical QCD structure from constituent-level inputs, or whether explicit physics representations remain complementary. To test this, we introduce PLuM, a multimodal architecture that projects particle constituents and Lund plane splittings into a shared latent space, processing both jointly with a unified transformer. Cross-attention allows the model to probe whether structured QCD information provides discriminating power beyond what particles alone encode. We observe systematic gains for top-quark and $\mathrm{H}\to\mathrm{b}\bar{\mathrm{b}}$ tagging, while finding no comparable improvement for $\mathrm{H}\to\mathrm{c}\bar{\mathrm{c}}$ or $\mathrm{H}\to 4\mathrm{q}$ topologies. This selective enhancement suggests that explicit hierarchical information about b-jet formation remains complementary to raw particle representations even in highly expressive architectures, while other topologies are already well-captured at constituent level. For high-impact LHC analyses such as Lorentz-boosted di-Higgs searches in the four $\mathrm{b}$ quark final state ($\mathrm{H}\mathrm{H}(4\mathrm{b})$), the gains are substantial: at a $25\%$ di-Higgs efficiency working point, PLuM achieves $25\%$ higher background rejection than the baseline. Our results indicate that physically structured representations of QCD radiation retain discriminating value in the transformer era, motivating further study into how different aspects of jet dynamics are encoded by deep learning algorithms.
HEP-PHMar 11, 2024
Re-Simulation-based Self-Supervised Learning for Pre-Training Foundation ModelsPhilip Harris, Michael Kagan, Jeffrey Krupa et al.
Self-Supervised Learning (SSL) is at the core of training modern large machine learning models, providing a scheme for learning powerful representations that can be used in a variety of downstream tasks. However, SSL strategies must be adapted to the type of training data and downstream tasks required. We propose RS3L ("Re-simulation-based self-supervised representation learning"), a novel simulation-based SSL strategy that employs a method of re-simulation to drive data augmentation for contrastive learning in the physical sciences, particularly, in fields that rely on stochastic simulators. By intervening in the middle of the simulation process and re-running simulation components downstream of the intervention, we generate multiple realizations of an event, thus producing a set of augmentations covering all physics-driven variations available in the simulator. Using experiments from high-energy physics, we explore how this strategy may enable the development of a foundation model; we show how RS3L pre-training enables powerful performance in downstream tasks such as discrimination of a variety of objects and uncertainty mitigation. In addition to our results, we make the RS3L dataset publicly available for further studies on how to improve SSL strategies.
INS-DETOct 26, 2025
Sub-microsecond Transformers for Jet Tagging on FPGAsLauri Laatu, Chang Sun, Arianna Cox et al.
We present the first sub-microsecond transformer implementation on an FPGA achieving competitive performance for state-of-the-art high-energy physics benchmarks. Transformers have shown exceptional performance on multiple tasks in modern machine learning applications, including jet tagging at the CERN Large Hadron Collider (LHC). However, their computational complexity prohibits use in real-time applications, such as the hardware trigger system of the collider experiments up until now. In this work, we demonstrate the first application of transformers for jet tagging on FPGAs, achieving $\mathcal{O}(100)$ nanosecond latency with superior performance compared to alternative baseline models. We leverage high-granularity quantization and distributed arithmetic optimization to fit the entire transformer model on a single FPGA, achieving the required throughput and latency. Furthermore, we add multi-head attention and linear attention support to hls4ml, making our work accessible to the broader fast machine learning community. This work advances the next-generation trigger systems for the High Luminosity LHC, enabling the use of transformers for real-time applications in high-energy physics and beyond.
HEP-PHOct 20, 2025
QINNs: Quantum-Informed Neural NetworksAritra Bal, Markus Klute, Benedikt Maier et al.
Classical deep neural networks can learn rich multi-particle correlations in collider data, but their inductive biases are rarely anchored in physics structure. We propose quantum-informed neural networks (QINNs), a general framework that brings quantum information concepts and quantum observables into purely classical models. While the framework is broad, in this paper, we study one concrete realisation that encodes each particle as a qubit and uses the Quantum Fisher Information Matrix (QFIM) as a compact, basis-independent summary of particle correlations. Using jet tagging as a case study, QFIMs act as lightweight embeddings in graph neural networks, increasing model expressivity and plasticity. The QFIM reveals distinct patterns for QCD and hadronic top jets that align with physical expectations. Thus, QINNs offer a practical, interpretable, and scalable route to quantum-informed analyses, that is, tomography, of particle collisions, particularly by enhancing well-established deep learning approaches.