Convergence of backpropagation with momentum for network architectures with skip connections
This provides theoretical guarantees for training complex network architectures, which is incremental as it extends existing convergence proofs.
The paper proves that backpropagation with momentum converges for deep neural networks with skip connections (DAG architectures), generalizing prior results for simpler feedforward networks. It includes an example showing improved compression in an autoencoder compared to sequential networks.
We study a class of deep neural networks with networks that form a directed acyclic graph (DAG). For backpropagation defined by gradient descent with adaptive momentum, we show weights converge for a large class of nonlinear activation functions. The proof generalizes the results of Wu et al. (2008) who showed convergence for a feed forward network with one hidden layer. For an example of the effectiveness of DAG architectures, we describe an example of compression through an autoencoder, and compare against sequential feed forward networks under several metrics.