MEMLFeb 16, 2021

Trees-Based Models for Correlated Data

arXiv:2102.08114v21 citations
AI Analysis

This addresses a specific issue in statistical modeling for researchers dealing with correlated datasets, but it is incremental as it modifies existing trees-based methods rather than introducing a new paradigm.

The paper tackles the problem of applying trees-based regression models to correlated data by proposing a new approach that incorporates the correlation structure into splitting criteria, stopping rules, and leaf values, showing superiority over standard methods in simulations and real data analyses.

This paper presents a new approach for trees-based regression, such as simple regression tree, random forest and gradient boosting, in settings involving correlated data. We show the problems that arise when implementing standard trees-based regression models, which ignore the correlation structure. Our new approach explicitly takes the correlation structure into account in the splitting criterion, stopping rules and fitted values in the leaves, which induces some major modifications of standard methodology. The superiority of our new approach over trees-based models that do not account for the correlation is supported by simulation experiments and real data analyses.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes