LGNov 5, 2025

Towards Scalable Backpropagation-Free Gradient Estimation

arXiv:2511.03110v1AI
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient gradient computation for deep learning, offering a potential alternative to backpropagation, though it appears incremental as it builds on existing forward-mode methods.

The paper tackles the problem of scaling gradient estimation without backpropagation by introducing a method that reduces bias and variance through manipulation of upstream Jacobian matrices, showing promising results with improved performance as network width increases.

While backpropagation--reverse-mode automatic differentiation--has been extraordinarily successful in deep learning, it requires two passes (forward and backward) through the neural network and the storage of intermediate activations. Existing gradient estimation methods that instead use forward-mode automatic differentiation struggle to scale beyond small networks due to the high variance of the estimates. Efforts to mitigate this have so far introduced significant bias to the estimates, reducing their utility. We introduce a gradient estimation approach that reduces both bias and variance by manipulating upstream Jacobian matrices when computing guess directions. It shows promising results and has the potential to scale to larger networks, indeed performing better as the network width is increased. Our understanding of this method is facilitated by analyses of bias and variance, and their connection to the low-dimensional structure of neural network gradients.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes