DATA-AN LG HEP-EXFeb 15, 2017

Support Vector Machines and generalisation in HEP

Adrian Bevan, Rodrigo Gamboa Goñi, Jon Hays, Tom Stevenson

arXiv:1702.04686v11.2

Originality Synthesis-oriented

AI Analysis

This work provides incremental improvements for HEP researchers using multivariate analysis to enhance model reliability in tasks like background suppression.

The paper addresses the challenge of hyper-parameter optimization in Support Vector Machines (SVMs) for High Energy Physics (HEP) to avoid overfitting and ensure generalizable performance, by extending SVM functionality in the TMVA toolkit with tools for cross-validation and comparing hold-out and k-fold methods.

We review the concept of Support Vector Machines (SVMs) and discuss examples of their use in a number of scenarios. Several SVM implementations have been used in HEP and we exemplify this algorithm using the Toolkit for Multivariate Analysis (TMVA) implementation. We discuss examples relevant to HEP including background suppression for $H\toτ^+τ^-$ at the LHC with several different kernel functions. Performance benchmarking leads to the issue of generalisation of hyper-parameter selection. The avoidance of fine tuning (over training or over fitting) in MVA hyper-parameter optimisation, i.e. the ability to ensure generalised performance of an MVA that is independent of the training, validation and test samples, is of utmost importance. We discuss this issue and compare and contrast performance of hold-out and k-fold cross-validation. We have extended the SVM functionality and introduced tools to facilitate cross validation in TMVA and present results based on these improvements.

View on arXiv PDF

Similar