LGApr 29

A projection-based framework for gradient-free and parallel learning

Andreas Bergmeister, Manish Krishan Lal, Stefanie Jegelka, Suvrit Sra

arXiv:2506.0587814.61 citationsh-index: 65

Predicted impact top 44% in LG · last 90 daysOriginality Highly original

AI Analysis

This work introduces a new paradigm for neural network training that could benefit researchers and practitioners seeking alternatives to gradient-based methods, particularly for non-differentiable operations or parallel computation.

The authors propose a projection-based framework for neural network training that reformulates it as a feasibility problem, enabling gradient-free and parallel learning. They implement PJAX, a JAX-based framework, and demonstrate training of MLPs, CNNs, and RNNs on standard benchmarks, showing it as a compelling alternative to gradient-based methods with advantages in parallelism and handling non-differentiable operations.

We present a feasibility-seeking approach to neural network training. This mathematical optimization framework is distinct from conventional gradient-based loss minimization and uses projection operators and iterative projection algorithms. We reformulate training as a large-scale feasibility problem: finding network parameters and states that satisfy local constraints derived from its elementary operations. Training then involves projecting onto these constraints, a local operation that can be parallelized across the network. We introduce PJAX, a JAX-based software framework that enables this paradigm. PJAX composes projection operators for elementary operations, automatically deriving the solution operators for the feasibility problems (akin to autodiff for derivatives). It inherently supports GPU/TPU acceleration, provides a familiar NumPy-like API, and is extensible. We train diverse architectures (MLPs, CNNs, RNNs) on standard benchmarks using PJAX, demonstrating its functionality and generality. Our results show that this approach is a compelling alternative to gradient-based training, with clear advantages in parallelism and the ability to handle non-differentiable operations.

View on arXiv PDF

Similar