Diversity Conscious Refined Random Forest
This work addresses efficiency and redundancy issues in ensemble learning for practitioners, but it is incremental as it builds on the standard Random Forest method.
The paper tackles the problem of high inference cost and model redundancy in Random Forest by proposing a Refined Random Forest Classifier that dynamically grows trees on informative features and enforces diversity through clustering, achieving improved accuracy on 8 benchmark datasets compared to standard RF.
Random Forest (RF) is a widely used ensemble learning technique known for its robust classification performance across diverse domains. However, it often relies on hundreds of trees and all input features, leading to high inference cost and model redundancy. In this work, our goal is to grow trees dynamically only on informative features and then enforce maximal diversity by clustering and retaining uncorrelated trees. Therefore, we propose a Refined Random Forest Classifier that iteratively refines itself by first removing the least informative features and then analytically determines how many new trees should be grown, followed by correlation-based clustering to remove redundant trees. The classification accuracy of our model was compared against the standard RF on the same number of trees. Experiments on 8 multiple benchmark datasets, including binary and multiclass datasets, demonstrate that the proposed model achieves improved accuracy compared to standard RF.