LGHEP-EXDATA-ANMLMay 28, 2019

Machine Learning on data with sPlot background subtraction

arXiv:1905.11719v511 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses a specific issue in high energy physics data analysis, allowing for more reliable machine learning applications in this domain, though it is incremental as it builds on existing sPlot methods.

The paper tackles the problem of training machine learning algorithms on data with background subtraction using the sPlot technique, which can cause divergence due to negative weights, by proposing a mathematically rigorous method to obtain signal probabilities without negative weights, enabling the use of standard machine learning tools.

Data analysis in high energy physics often deals with data samples consisting of a mixture of signal and background events. The sPlot technique is a common method to subtract the contribution of the background by assigning weights to events. Part of the weights are by design negative. Negative weights lead to the divergence of some machine learning algorithms training due to absence of the lower bound in the loss function. In this paper we propose a mathematically rigorous way to train machine learning algorithms on data samples with background described by sPlot to obtain signal probabilities conditioned on observables, without encountering negative event weight at all. This allows usage of any out-of-the-box machine learning methods on such data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes