LGMLJun 11, 2024

Treeffuser: Probabilistic Predictions via Conditional Diffusions with Gradient-Boosted Trees

arXiv:2406.07658v26 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the need for flexible, non-parametric probabilistic models in domains like inventory allocation, though it is incremental as it combines existing techniques.

The paper tackles the problem of probabilistic prediction on tabular data by proposing Treeffuser, a method that learns conditional diffusion models with gradient-boosted trees, resulting in better calibrated predictions that outperform existing methods on synthetic and real data.

Probabilistic prediction aims to compute predictive distributions rather than single point predictions. These distributions enable practitioners to quantify uncertainty, compute risk, and detect outliers. However, most probabilistic methods assume parametric responses, such as Gaussian or Poisson distributions. When these assumptions fail, such models lead to bad predictions and poorly calibrated uncertainty. In this paper, we propose Treeffuser, an easy-to-use method for probabilistic prediction on tabular data. The idea is to learn a conditional diffusion model where the score function is estimated using gradient-boosted trees. The conditional diffusion model makes Treeffuser flexible and non-parametric, while the gradient-boosted trees make it robust and easy to train on CPUs. Treeffuser learns well-calibrated predictive distributions and can handle a wide range of regression tasks -- including those with multivariate, multimodal, and skewed responses. We study Treeffuser on synthetic and real data and show that it outperforms existing methods, providing better calibrated probabilistic predictions. We further demonstrate its versatility with an application to inventory allocation under uncertainty using sales data from Walmart. We implement Treeffuser in https://github.com/blei-lab/treeffuser.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes