MLLGSTDec 13, 2019

Data-driven confidence bands for distributed nonparametric regression

arXiv:1912.06689v23 citations
Originality Incremental advance
AI Analysis

This work addresses the need for uncertainty quantification in distributed regression methods, which is crucial for applications handling large-scale data, though it appears incremental as it builds on existing divide-and-conquer approaches.

The paper tackles the problem of quantifying uncertainty in distributed nonparametric regression for massive datasets, proposing a data-driven algorithm that yields frequentist L2-confidence bands and demonstrates its validity, with a minimax-optimal high-probability bound for the averaged estimator.

Gaussian Process Regression and Kernel Ridge Regression are popular nonparametric regression approaches. Unfortunately, they suffer from high computational complexity rendering them inapplicable to the modern massive datasets. To that end a number of approximations have been suggested, some of them allowing for a distributed implementation. One of them is the divide and conquer approach, splitting the data into a number of partitions, obtaining the local estimates and finally averaging them. In this paper we suggest a novel computationally efficient fully data-driven algorithm, quantifying uncertainty of this method, yielding frequentist $L_2$-confidence bands. We rigorously demonstrate validity of the algorithm. Another contribution of the paper is a minimax-optimal high-probability bound for the averaged estimator, complementing and generalizing the known risk bounds.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes