On variation of gradients of deep neural networks
This provides theoretical insights for designing neural network architectures to improve transformation invariance, but it is incremental as it builds on existing understanding of network properties.
The authors tackled the problem of understanding how layer sizes affect gradient variation in deep neural networks, proving that the largest variation occurs when the layer with the fewest nodes changes its activation pattern, which helps design architectures for increased complexity and invariance.
We provide a theoretical explanation of the role of the number of nodes at each layer in deep neural networks. We prove that the largest variation of a deep neural network with ReLU activation function arises when the layer with the fewest nodes changes its activation pattern. An important implication is that deep neural network is a useful tool to generate functions most of whose variations are concentrated on a smaller area of the input space near the boundaries corresponding to the layer with the fewest nodes. In turn, this property makes the function more invariant to input transformation. That is, our theoretical result gives a clue about how to design the architecture of a deep neural network to increase complexity and transformation invariancy simultaneously.