Data-driven multinomial random forest: A new random forest variant with strong consistency
This work provides a more reliable and efficient random forest method for machine learning practitioners, though it is incremental as it builds on existing variants.
The paper tackles the problem of improving random forest variants by proposing a data-driven multinomial random forest (DMRF) that achieves strong consistency with probability 1, showing better performance in classification and regression tasks than previous weakly consistent variants and often surpassing BreimanRF in classification.
In this paper, we modify the proof methods of some previously weakly consistent variants of random forests into strongly consistent proof methods, and improve the data utilization of these variants in order to obtain better theoretical properties and experimental performance. In addition, we propose a data-driven multinomial random forest (DMRF), which has the same complexity with BreimanRF (proposed by Breiman) while satisfying strong consistency with probability 1. It has better performance in classification and regression problems than previous RF variants that only satisfy weak consistency, and in most cases even surpasses BreimanRF in classification tasks. To the best of our knowledge, DMRF is currently a low-complexity and high-performing variation of random forests that achieves strong consistency with probability 1.