OCApr 18, 2022
An Optimal Time Variable Learning Framework for Deep Neural NetworksHarbir Antil, Hugo Díaz, Evelyn Herberg
Feature propagation in Deep Neural Networks (DNNs) can be associated to nonlinear discrete dynamical systems. The novelty, in this paper, lies in letting the discretization parameter (time step-size) vary from layer to layer, which needs to be learned, in an optimization framework. The proposed framework can be applied to any of the existing networks such as ResNet, DenseNet or Fractional-DNN. This framework is shown to help overcome the vanishing and exploding gradient issues. Stability of some of the existing continuous DNNs such as Fractional-DNN is also studied. The proposed approach is applied to an ill-posed 3D-Maxwell's equation.
12.6NAMay 13
A Majorization-Minimization with Monte Carlo Approach for Hyperparameter EstimationElle Buser, Julianne Chung, Hugo Díaz et al.
We consider inverse problems with linear forward models and Gaussian priors, but with unknown hyperparameters that may arise from the model, the noise, or the specification of the prior. We model this using a hierarchical Bayes framework resulting in a posterior distribution that is non-Gaussian, in general, and challenging to sample from. Consequently, we use an empirical Bayes framework for estimating the maximum a posteriori estimate of the hyperpameters by considering the marginalized posterior distribution. However, the optimization problem is also computationally challenging due to the need for repeated evaluation of log determinants. To address this issue, we propose a Majorization-Minimization with Monte Carlo approach, which we call M$^{3}$C, for hyperparameter estimation. Specifically, we replace the challenging optimization problem with a sequence of simpler ones by utilizing a majorization function (or majorant) for the log-determinant term, combined with a Monte Carlo estimator to approximate the majorant. We provide theoretical results, showing that under certain assumptions, the M$^{3}$C iterates converge with high probability to a critical point of the original cost function. A variety of numerical examples are provided from seismic tomography, super-resolution imaging, and contaminant source identification.