Multimodal Generative Flows for LHC Jets
This work addresses a domain-specific problem for high-energy physics researchers by providing a data-driven simulation tool for LHC jets, though it is incremental as it builds on existing flow-matching methods.
The paper tackled the challenge of generative modeling for LHC jets, which involve hybrid particle-cloud data with continuous and discrete features, by introducing a transformer-based multimodal flow that jointly models both modalities, achieving high fidelity in generating jets with realistic kinematics, substructure, and flavor composition on CMS Open Data.
Generative modeling of high-energy collisions at the Large Hadron Collider (LHC) offers a data-driven route to simulations, anomaly detection, among other applications. A central challenge lies in the hybrid nature of particle-cloud data: each particle carries continuous kinematic features and discrete quantum numbers such as charge and flavor. We introduce a transformer-based multimodal flow that extends flow-matching with a continuous-time Markov jump bridge to jointly model LHC jets with both modalities. Trained on CMS Open Data, our model can generate high fidelity jets with realistic kinematics, jet substructure and flavor composition.