MLLGCOJan 29, 2019

Variational Characterizations of Local Entropy and Heat Regularization in Deep Learning

arXiv:1901.10082v11 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of gradient-based training bottlenecks in neural networks, offering a theoretical and computational framework for more efficient optimization, though it appears incremental in building on existing regularization methods.

The paper tackles the problem of optimizing deep learning models using local entropy and heat regularization by introducing variational characterizations that unify their optimization into a two-step scheme, enabling gradient-free, parallelizable training through sampling algorithms.

The aim of this paper is to provide new theoretical and computational understanding on two loss regularizations employed in deep learning, known as local entropy and heat regularization. For both regularized losses we introduce variational characterizations that naturally suggest a two-step scheme for their optimization, based on the iterative shift of a probability density and the calculation of a best Gaussian approximation in Kullback-Leibler divergence. Under this unified light, the optimization schemes for local entropy and heat regularized loss differ only over which argument of the Kullback-Leibler divergence is used to find the best Gaussian approximation. Local entropy corresponds to minimizing over the second argument, and the solution is given by moment matching. This allows to replace traditional back-propagation calculation of gradients by sampling algorithms, opening an avenue for gradient-free, parallelizable training of neural networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes