Invariant Random Forest: Tree-Based Model Solution for OOD Generalization
It addresses OOD generalization for tree-based models, a domain-specific problem that is incremental as it extends existing OOD methods from neural networks to decision trees.
The paper tackles out-of-distribution generalization for decision tree models by introducing Invariant Decision Tree and its ensemble version, Invariant Random Forest, which enforce penalties on unstable splits across environments, achieving superior performance compared to non-OOD tree models in synthetic and real datasets.
Out-Of-Distribution (OOD) generalization is an essential topic in machine learning. However, recent research is only focusing on the corresponding methods for neural networks. This paper introduces a novel and effective solution for OOD generalization of decision tree models, named Invariant Decision Tree (IDT). IDT enforces a penalty term with regard to the unstable/varying behavior of a split across different environments during the growth of the tree. Its ensemble version, the Invariant Random Forest (IRF), is constructed. Our proposed method is motivated by a theoretical result under mild conditions, and validated by numerical tests with both synthetic and real datasets. The superior performance compared to non-OOD tree models implies that considering OOD generalization for tree models is absolutely necessary and should be given more attention.