BoostTree and BoostForest for Ensemble Learning
This work addresses the need for more accurate and reliable ensemble models in fields like biology, engineering, and healthcare, though it is incremental as it builds on existing boosting and bagging methods.
The paper tackles the problem of improving ensemble learning performance by proposing BoostForest, which uses BoostTree as base learners and incorporates randomness in both tree construction and data bootstrapping. The result is that BoostForest generally outperformed four classical ensemble learning approaches on 35 classification and regression datasets.
Bootstrap aggregating (Bagging) and boosting are two popular ensemble learning approaches, which combine multiple base learners to generate a composite model for more accurate and more reliable performance. They have been widely used in biology, engineering, healthcare, etc. This paper proposes BoostForest, which is an ensemble learning approach using BoostTree as base learners and can be used for both classification and regression. BoostTree constructs a tree model by gradient boosting. It increases the randomness (diversity) by drawing the cut-points randomly at node splitting. BoostForest further increases the randomness by bootstrapping the training data in constructing different BoostTrees. BoostForest generally outperformed four classical ensemble learning approaches (Random Forest, Extra-Trees, XGBoost and LightGBM) on 35 classification and regression datasets. Remarkably, BoostForest tunes its parameters by simply sampling them randomly from a parameter pool, which can be easily specified, and its ensemble learning framework can also be used to combine many other base learners.