LGAIFeb 4, 2024

A generalized decision tree ensemble based on the NeuralNetworks architecture: Distributed Gradient Boosting Forest (DGBF)

arXiv:2402.03386v13 citationsh-index: 19Applied intelligence (Boston)
AI Analysis

This work addresses a problem for machine learning practitioners by offering a novel ensemble method that enhances representation learning for tabular data, though it appears incremental as it builds on existing tree-based techniques.

The paper tackled the limitation of tree ensemble algorithms like RandomForest and GradientBoosting in performing hierarchical representation learning from raw data, as they cannot be trained with back-propagation, by proposing a Distributed Gradient Boosting Forest (DGBF) that combines bagging and boosting into a graph-structured ensemble with distributed learning, which outperformed both methods in 7 out of 9 datasets.

Tree ensemble algorithms as RandomForest and GradientBoosting are currently the dominant methods for modeling discrete or tabular data, however, they are unable to perform a hierarchical representation learning from raw data as NeuralNetworks does thanks to its multi-layered structure, which is a key feature for DeepLearning problems and modeling unstructured data. This limitation is due to the fact that tree algorithms can not be trained with back-propagation because of their mathematical nature. However, in this work, we demonstrate that the mathematical formulation of bagging and boosting can be combined together to define a graph-structured-tree-ensemble algorithm with a distributed representation learning process between trees naturally (without using back-propagation). We call this novel approach Distributed Gradient Boosting Forest (DGBF) and we demonstrate that both RandomForest and GradientBoosting can be expressed as particular graph architectures of DGBT. Finally, we see that the distributed learning outperforms both RandomForest and GradientBoosting in 7 out of 9 datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes