Machine Learning for Multi-Output Regression: When should a holistic multivariate approach be preferred over separate univariate ones?
This work addresses a practical decision problem for researchers and practitioners using tree-based ensembles in multi-output regression, but it is incremental as it compares existing methods rather than introducing new ones.
The paper investigates whether multivariate ensemble techniques outperform separate univariate models for multi-output regression, finding that multivariate approaches can be beneficial in certain scenarios based on extensive simulations.
Tree-based ensembles such as the Random Forest are modern classics among statistical learning methods. In particular, they are used for predicting univariate responses. In case of multiple outputs the question arises whether we separately fit univariate models or directly follow a multivariate approach. For the latter, several possibilities exist that are, e.g. based on modified splitting or stopping rules for multi-output regression. In this work we compare these methods in extensive simulations to help in answering the primary question when to use multivariate ensemble techniques.