Local Propagation in Constraint-based Neural Network
This work addresses the gradient vanishing issue in deep learning, offering a novel optimization method that could enable more complex architectures, though it appears incremental in nature.
The paper tackles the problem of gradient vanishing in deep neural networks by proposing a constraint-based representation and a fully parallelizable Local Propagation algorithm that searches for saddle points in an adjoint space. Experimental results show that LP is a feasible approach for training both shallow and deep networks.
In this paper we study a constraint-based representation of neural network architectures. We cast the learning problem in the Lagrangian framework and we investigate a simple optimization procedure that is well suited to fulfil the so-called architectural constraints, learning from the available supervisions. The computational structure of the proposed Local Propagation (LP) algorithm is based on the search for saddle points in the adjoint space composed of weights, neural outputs, and Lagrange multipliers. All the updates of the model variables are locally performed, so that LP is fully parallelizable over the neural units, circumventing the classic problem of gradient vanishing in deep networks. The implementation of popular neural models is described in the context of LP, together with those conditions that trace a natural connection with Backpropagation. We also investigate the setting in which we tolerate bounded violations of the architectural constraints, and we provide experimental evidence that LP is a feasible approach to train shallow and deep networks, opening the road to further investigations on more complex architectures, easily describable by constraints.