LG HEP-EXAug 9, 2025

Approaching Maximal Information Extraction in Low-Signal Regimes via Multiple Instance Learning

arXiv:2508.07114v14.1

Originality Incremental advance

AI Analysis

This work addresses the challenge of enhancing machine learning model accuracy in scenarios where state-of-the-art classifiers struggle, with incremental improvements for domains like high-energy physics.

The paper tackles the problem of improving prediction precision and discriminative power in low-signal regimes by proposing a Multiple Instance Learning (MIL) methodology, demonstrating its potential to approach maximal Fisher Information extraction in applications like constraining Wilson coefficients in particle physics.

In this work, we propose a new machine learning (ML) methodology to obtain more precise predictions for some parameters of interest in a given hypotheses testing problem. Our proposed method also allows ML models to have more discriminative power in cases where it is extremely challenging for state-of-the-art classifiers to have any level of accurate predictions. This method can also allow us to systematically decrease the error from ML models in their predictions. In this paper, we provide a mathematical motivation why Multiple Instance Learning (MIL) would have more predictive power over their single-instance counterparts. We support our theoretical claims by analyzing the behavior of the MIL models through their scaling behaviors with respect to the number of instances on which the model makes predictions. As a concrete application, we constrain Wilson coefficients of the Standard Model Effective Field Theory (SMEFT) using kinematic information from subatomic particle collision events at the Large Hadron Collider (LHC). We show that under certain circumstances, it might be possible to extract the theoretical maximum Fisher Information latent in a dataset.

View on arXiv PDF

Similar