HEP-PH LG DATA-ANDec 21, 2016

Stacking machine learning classifiers to identify Higgs bosons at the LHC

arXiv:1612.07725v332 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving classification accuracy and efficiency for particle physicists at the LHC, but it is incremental as it builds on existing stacking and MVA methods.

The paper tackled the problem of classifying signal and background events in particle physics, specifically for identifying Higgs bosons at the LHC, by comparing stacked generalization against deep neural networks and boosted decision trees. It found that stacking performed 16% worse than DNN in cut-and-count analysis but outperformed boosted decision trees, and using stacking with multivariate analysis significantly enhanced statistical significance compared to cut-and-count.

Machine learning (ML) algorithms have been employed in the problem of classifying signal and background events with high accuracy in particle physics. In this paper, we compare the performance of a widespread ML technique, namely, \emph{stacked generalization}, against the results of two state-of-art algorithms: (1) a deep neural network (DNN) in the task of discovering a new neutral Higgs boson and (2) a scalable machine learning system for tree boosting, in the Standard Model Higgs to tau leptons channel, both at the 8 TeV LHC. In a cut-and-count analysis, \emph{stacking} three algorithms performed around 16\% worse than DNN but demanding far less computation efforts, however, the same \emph{stacking} outperforms boosted decision trees. Using the stacked classifiers in a multivariate statistical analysis (MVA), on the other hand, significantly enhances the statistical significance compared to cut-and-count in both Higgs processes, suggesting that combining an ensemble of simpler and faster ML algorithms with MVA tools is a better approach than building a complex state-of-art algorithm for cut-and-count.

View on arXiv PDF

Similar