LGMLNov 21, 2019

Volume-preserving Neural Networks

arXiv:1911.09576v31 citations
Originality Incremental advance
AI Analysis

This addresses a fundamental issue in deep learning for practitioners, though it appears incremental as it modifies existing architectures rather than introducing a new paradigm.

The authors tackled the vanishing/exploding gradient problem in deep neural networks by proposing a volume-preserving architecture, achieving stable gradient equilibrium as demonstrated on two standard datasets.

We propose a novel approach to addressing the vanishing (or exploding) gradient problem in deep neural networks. We construct a new architecture for deep neural networks where all layers (except the output layer) of the network are a combination of rotation, permutation, diagonal, and activation sublayers which are all volume preserving. Our approach replaces the standard weight matrix of a neural network with a combination of diagonal, rotational and permutation matrices, all of which are volume-preserving. We introduce a coupled activation function allowing us to preserve volume even in the activation function portion of a neural network layer. This control on the volume forces the gradient (on average) to maintain equilibrium and not explode or vanish. To demonstrate our architecture we apply our volume-preserving neural network model to two standard datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes