Consensual Aggregation on Random Projected High-dimensional Features for Regression
This work addresses the challenge of efficiently merging large numbers of redundant regression models without model selection or cross-validation, which is incremental as it builds on existing aggregation and projection techniques.
The paper tackles the problem of aggregating predictions from many redundant regression estimators by proposing a two-step method that uses random projection to reduce dimensionality and kernel-based consensual aggregation, showing theoretically and numerically that it maintains performance close to using original high-dimensional features with high probability, as validated on synthetic and real datasets.
In this paper, we present a study of a kernel-based consensual aggregation on randomly projected high-dimensional features of predictions for regression. The aggregation scheme is composed of two steps: the high-dimensional features of predictions, given by a large number of regression estimators, are randomly projected into a smaller subspace using Johnson-Lindenstrauss Lemma in the first step, and a kernel-based consensual aggregation is implemented on the projected features in the second step. We theoretically show that the performance of the aggregation scheme is close to the performance of the aggregation implemented on the original high-dimensional features, with high probability. Moreover, we numerically illustrate that the aggregation scheme upholds its performance on very large and highly correlated features of predictions given by different types of machines. The aggregation scheme allows us to flexibly merge a large number of redundant machines, plainly constructed without model selection or cross-validation. The efficiency of the proposed method is illustrated through several experiments evaluated on different types of synthetic and real datasets.