STLGNENov 26, 2024

Improving the Convergence Rates of Forward Gradient Descent with Repeated Sampling

arXiv:2411.17567v1h-index: 3
Originality Incremental advance
AI Analysis

This work addresses the convergence inefficiency of FGD, a biologically plausible alternative to gradient descent, for machine learning practitioners seeking faster training without backward passes.

The paper tackles the slow convergence of forward gradient descent (FGD) compared to stochastic gradient descent (SGD) in linear models by showing that using repeated sampling of training samples reduces the suboptimality factor from d to d/(ℓ ∧ d), eliminating it when ℓ ≳ d, and enabling adaptation to low-dimensional input structures.

Forward gradient descent (FGD) has been proposed as a biologically more plausible alternative of gradient descent as it can be computed without backward pass. Considering the linear model with $d$ parameters, previous work has found that the prediction error of FGD is, however, by a factor $d$ slower than the prediction error of stochastic gradient descent (SGD). In this paper we show that by computing $\ell$ FGD steps based on each training sample, this suboptimality factor becomes $d/(\ell \wedge d)$ and thus the suboptimality of the rate disappears if $\ell \gtrsim d.$ We also show that FGD with repeated sampling can adapt to low-dimensional structure in the input distribution. The main mathematical challenge lies in controlling the dependencies arising from the repeated sampling process.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes