LGDec 16, 2023

On the Trade-off between the Number of Nodes and the Number of Trees in a Random Forest

arXiv:2312.11540v2h-index: 16
AI Analysis

This addresses efficiency in random forest deployment for binary classification, but it is incremental as it focuses on specific theoretical constraints.

The paper tackles the problem of reducing the number of trees in a random forest while maintaining accuracy, showing that for binary decision problems, the majority function can be represented by fewer trees with polynomial size if the difference between the original and reduced tree counts is constant, and similar results hold with small error for general cases.

In this paper, we focus on the prediction phase of a random forest and study the problem of representing a bag of decision trees using a smaller bag of decision trees, where we only consider binary decision problems on the binary domain and simple decision trees in which an internal node is limited to querying the Boolean value of a single variable. As a main result, we show that the majority function of $n$ variables can be represented by a bag of $T$ ($< n$) decision trees each with polynomial size if $n-T$ is a constant, where $n$ and $T$ must be odd (in order to avoid the tie break). We also show that a bag of $n$ decision trees can be represented by a bag of $T$ decision trees each with polynomial size if $n-T$ is a constant and a small classification error is allowed. A related result on the $k$-out-of-$n$ functions is presented too.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes