ML LGMay 24, 2021

Uncertainty quantification for distributed regression

arXiv:2105.11425v1

Originality Incremental advance

AI Analysis

This work addresses uncertainty quantification for distributed regression, offering a practical solution for large-scale data analysis, though it is incremental as it builds on existing divide-and-conquer methods.

The paper tackles the computational challenge of scaling kernel ridge regression to large datasets by proposing a fully data-driven method to quantify uncertainty for averaged estimators in divide-and-conquer approaches, providing rigorous theoretical guarantees and sup-norm consistency results.

The ever-growing size of the datasets renders well-studied learning techniques, such as Kernel Ridge Regression, inapplicable, posing a serious computational challenge. Divide-and-conquer is a common remedy, suggesting to split the dataset into disjoint partitions, obtain the local estimates and average them, it allows to scale-up an otherwise ineffective base approach. In the current study we suggest a fully data-driven approach to quantify uncertainty of the averaged estimator. Namely, we construct simultaneous element-wise confidence bands for the predictions yielded by the averaged estimator on a given deterministic prediction set. The novel approach features rigorous theoretical guaranties for a wide class of base learners with Kernel Ridge regression being a special case. As a by-product of our analysis we also obtain a sup-norm consistency result for the divide-and-conquer Kernel Ridge Regression. The simulation study supports the theoretical findings.

View on arXiv PDF

Similar