MLDec 18, 2016

Distributed Generalized Cross-Validation for Divide-and-Conquer Kernel Ridge Regression and its Asymptotic Optimality

arXiv:1612.05907v27.836 citations

Originality Incremental advance

AI Analysis

This solves the tuning parameter selection problem for practitioners using d-KRR on massive datasets, though it is an incremental extension of existing GCV results.

The paper tackles the lack of data-driven tuning methods for divide-and-conquer kernel ridge regression (d-KRR) in large datasets by proposing a distributed Generalized Cross-Validation (dGCV) method, which is shown to be asymptotically optimal for minimizing the true global conditional empirical loss.

Tuning parameter selection is of critical importance for kernel ridge regression. To this date, data driven tuning method for divide-and-conquer kernel ridge regression (d-KRR) has been lacking in the literature, which limits the applicability of d-KRR for large data sets. In this paper, by modifying the Generalized Cross-validation (GCV, Wahba, 1990) score, we propose a distributed Generalized Cross-Validation (dGCV) as a data-driven tool for selecting the tuning parameters in d-KRR. Not only the proposed dGCV is computationally scalable for massive data sets, it is also shown, under mild conditions, to be asymptotically optimal in the sense that minimizing the dGCV score is equivalent to minimizing the true global conditional empirical loss of the averaged function estimator, extending the existing optimality results of GCV to the divide-and-conquer framework.

View on arXiv PDF

Similar