MACK: Mismodeling Addressed with Contrastive Knowledge
This addresses a critical issue for high energy physics researchers by reducing simulation biases in complex models, though it appears incremental as it builds on existing contrastive learning techniques.
The paper tackles the problem of machine learning models in high energy physics being sensitive to mismodeling between simulation and real data, presenting a contrastive learning method that mitigates this effect without prior knowledge of the mismodeling specifics, achieving significant improvements in jet-tagging tasks at the Large Hadron Collider.
The use of machine learning methods in high energy physics typically relies on large volumes of precise simulation for training. As machine learning models become more complex they can become increasingly sensitive to differences between this simulation and the real data collected by experiments. We present a generic methodology based on contrastive learning which is able to greatly mitigate this negative effect. Crucially, the method does not require prior knowledge of the specifics of the mismodeling. While we demonstrate the efficacy of this technique using the task of jet-tagging at the Large Hadron Collider, it is applicable to a wide array of different tasks both in and out of the field of high energy physics.