LGMLJun 23, 2020

On Compression Principle and Bayesian Optimization for Neural Networks

arXiv:2006.12714v11 citations
Originality Incremental advance
AI Analysis

This addresses the fundamental challenge of model generalization in machine learning, offering a novel approach that could impact neural network optimization, though it appears incremental in its application of Bayesian methods.

The paper tackles the problem of making generalizable predictions by proposing a compression principle that minimizes total compressed message length, and introduces Bayesian Stochastic Gradient Descent (BSGD) as an optimizer for hyper-parameters, requiring only three parameters for training.

Finding methods for making generalizable predictions is a fundamental problem of machine learning. By looking into similarities between the prediction problem for unknown data and the lossless compression we have found an approach that gives a solution. In this paper we propose a compression principle that states that an optimal predictive model is the one that minimizes a total compressed message length of all data and model definition while guarantees decodability. Following the compression principle we use Bayesian approach to build probabilistic models of data and network definitions. A method to approximate Bayesian integrals using a sequence of variational approximations is implemented as an optimizer for hyper-parameters: Bayesian Stochastic Gradient Descent (BSGD). Training with BSGD is completely defined by setting only three parameters: number of epochs, the size of the dataset and the size of the minibatch, which define a learning rate and a number of iterations. We show that dropout can be used for a continuous dimensionality reduction that allows to find optimal network dimensions as required by the compression principle.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes