Jung Eun Huh

24.4NAApr 6

Adaptive LSQR Preconditioning from One Small Sketch

Jung Eun Huh, Coralia Cartis, Yuji Nakatsukasa

We propose APLICUR, an adaptive preconditioning framework for large-scale linear least-squares (LLS) problems. Using a single small sketch computed once at initialization, APLICUR incrementally refines a CUR-based preconditioner throughout the Krylov solve, interleaving preconditioning with iteration. This enables early convergence without the need to construct a costly high-quality preconditioner upfront. With a modest sketch dimension (typically 5 - 250), largely independent of both the problem size and numerical rank, APLICUR achieves convergence guarantees that are likewise independent of the sketch size. The method is applicable to general matrices without structural assumptions (e.g. need not be heavily overdetermined) and is well suited to large, sparse, or numerically low-rank problems. We conduct extensive numerical studies to examine the behavior of the proposed framework and guide the effective algorithmic design choices. Across a range of test problems, \mainalg{} achieves competitive or improved time-to-accuracy performance compared with established randomized preconditioners, including Blendenpik and NystrÃ¶m PCG, while maintaining low setup cost and robustness across problem regimes.

MLNov 1, 2023

Generalization Bounds for Label Noise Stochastic Gradient Descent

Jung Eun Huh, Patrick Rebeschini

We develop generalization error bounds for stochastic gradient descent (SGD) with label noise in non-convex settings under uniform dissipativity and smoothness conditions. Under a suitable choice of semimetric, we establish a contraction in Wasserstein distance of the label noise stochastic gradient flow that depends polynomially on the parameter dimension $d$. Using the framework of algorithmic stability, we derive time-independent generalisation error bounds for the discretized algorithm with a constant learning rate. The error bound we achieve scales polynomially with $d$ and with the rate of $n^{-2/3}$, where $n$ is the sample size. This rate is better than the best-known rate of $n^{-1/2}$ established for stochastic gradient Langevin dynamics (SGLD) -- which employs parameter-independent Gaussian noise -- under similar conditions. Our analysis offers quantitative insights into the effect of label noise.

Jung Eun Huh

2 Papers