Lower Bounds and a Near-Optimal Shrinkage Estimator for Least Squares using Random Projections
This work addresses the challenge of efficient and accurate large-scale linear regression for data scientists and engineers, offering incremental improvements over existing sketching methods.
The paper tackles the problem of improving the accuracy of least squares solutions using random projections by deriving tight error lower bounds for any estimator with Gaussian sketches and proposing a James-Stein-based shrinkage estimator that reduces error compared to classical sketching, with empirical validation on simulated and real datasets.
In this work, we consider the deterministic optimization using random projections as a statistical estimation problem, where the squared distance between the predictions from the estimator and the true solution is the error metric. In approximately solving a large scale least squares problem using Gaussian sketches, we show that the sketched solution has a conditional Gaussian distribution with the true solution as its mean. Firstly, tight worst case error lower bounds with explicit constants are derived for any estimator using the Gaussian sketch, and the classical sketching is shown to be the optimal unbiased estimator. For biased estimators, the lower bound also incorporates prior knowledge about the true solution. Secondly, we use the James-Stein estimator to derive an improved estimator for the least squares solution using the Gaussian sketch. An upper bound on the expected error of this estimator is derived, which is smaller than the error of the classical Gaussian sketch solution for any given data. The upper and lower bounds match when the SNR of the true solution is known to be small and the data matrix is well conditioned. Empirically, this estimator achieves smaller error on simulated and real datasets, and works for other common sketching methods as well.