LGAug 7, 2023

Generative Forests

arXiv:2308.03648v35.33 citationsh-index: 36

Originality Highly original

AI Analysis

This addresses the problem of generating realistic tabular data, which is prevalent in many domains, with a novel approach that offers practical benefits for related tasks.

The paper tackles generative modeling for tabular data by introducing forest-based models and a training algorithm with convergence guarantees, achieving substantial improvements in generated data quality compared to state-of-the-art methods and showing strong performance on tasks like missing data imputation and density estimation.

We focus on generative AI for a type of data that still represent one of the most prevalent form of data: tabular data. Our paper introduces two key contributions: a new powerful class of forest-based models fit for such tasks and a simple training algorithm with strong convergence guarantees in a boosting model that parallels that of the original weak / strong supervised learning setting. This algorithm can be implemented by a few tweaks to the most popular induction scheme for decision tree induction (i.e. supervised learning) with two classes. Experiments on the quality of generated data display substantial improvements compared to the state of the art. The losses our algorithm minimize and the structure of our models make them practical for related tasks that require fast estimation of a density given a generative model and an observation (even partially specified): such tasks include missing data imputation and density estimation. Additional experiments on these tasks reveal that our models can be notably good contenders to diverse state of the art methods, relying on models as diverse as (or mixing elements of) trees, neural nets, kernels or graphical models.

View on arXiv PDF

Similar