Enno Mammen

2papers

2 Papers

MLDec 29, 2020Code
Random Planted Forest: a directly interpretable tree ensemble

Munir Hiabu, Enno Mammen, Joseph T. Meyer

We introduce a novel interpretable tree based algorithm for prediction in a regression setting. Our motivation is to estimate the unknown regression function from a functional decomposition perspective in which the functional components correspond to lower order interaction terms. The idea is to modify the random forest algorithm by keeping certain leaves after they are split instead of deleting them. This leads to non-binary trees which we refer to as planted trees. An extension to a forest leads to our random planted forest algorithm. Additionally, the maximum number of covariates which can interact within a leaf can be bounded. If we set this interaction bound to one, the resulting estimator is a sum of one-dimensional functions. In the other extreme case, if we do not set a limit, the resulting estimator and corresponding model place no restrictions on the form of the regression function. In a simulation study we find encouraging prediction and visualisation properties of our random planted forest method. We also develop theory for an idealized version of random planted forests in cases where the interaction bound is low. We show that if it is smaller than three, the idealized version achieves asymptotically optimal convergence rates up to a logarithmic factor. Code is available on GitHub https://github.com/PlantedML/randomPlantedForest.

MLJun 19, 2024
Pure interaction effects unseen by Random Forests

Ricardo Blum, Munir Hiabu, Enno Mammen et al.

Random Forests are widely claimed to capture interactions well. However, some simple examples suggest that they perform poorly in the presence of certain pure interactions that the conventional CART criterion struggles to capture during tree construction. Motivated from this, it is argued that simple alternative partitioning schemes used in the tree growing procedure can enhance identification of these interactions. In a simulation study these variants are compared to conventional Random Forests and Extremely Randomized Trees. The results validate that the modifications considered enhance the model's fitting ability in scenarios where pure interactions play a crucial role. Finally, the methods are applied to real datasets.