An Optimal Time Variable Learning Framework for Deep Neural Networks
This work addresses gradient issues in DNNs for applications like solving complex equations, but it appears incremental as it builds on existing networks like ResNet and DenseNet.
The authors tackled the problem of vanishing and exploding gradients in deep neural networks by introducing a framework that learns variable time steps across layers, which was applied to an ill-posed 3D-Maxwell's equation.
Feature propagation in Deep Neural Networks (DNNs) can be associated to nonlinear discrete dynamical systems. The novelty, in this paper, lies in letting the discretization parameter (time step-size) vary from layer to layer, which needs to be learned, in an optimization framework. The proposed framework can be applied to any of the existing networks such as ResNet, DenseNet or Fractional-DNN. This framework is shown to help overcome the vanishing and exploding gradient issues. Stability of some of the existing continuous DNNs such as Fractional-DNN is also studied. The proposed approach is applied to an ill-posed 3D-Maxwell's equation.