BUFF: Boosted Decision Tree based Ultra-Fast Flow matching

arXiv:2404.18219v13 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses the problem of slow and inaccurate tabular data simulation for high energy physics researchers, though it appears incremental as it combines existing techniques (boosted trees and flow matching).

The paper tackles the challenge of simulating high-dimensional tabular data with complex correlations in high energy physics by integrating Gradient Boosted Trees with conditional flow matching. It demonstrates that this approach achieves orders of magnitude speedup in training and inference time for high-level simulation tasks while maintaining competitive performance.

Tabular data stands out as one of the most frequently encountered types in high energy physics. Unlike commonly homogeneous data such as pixelated images, simulating high-dimensional tabular data and accurately capturing their correlations are often quite challenging, even with the most advanced architectures. Based on the findings that tree-based models surpass the performance of deep learning models for tasks specific to tabular data, we adopt the very recent generative modeling class named conditional flow matching and employ different techniques to integrate the usage of Gradient Boosted Trees. The performances are evaluated for various tasks on different analysis level with several public datasets. We demonstrate the training and inference time of most high-level simulation tasks can achieve speedup by orders of magnitude. The application can be extended to low-level feature simulation and conditioned generations with competitive performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes