LGNEMLJan 16, 2013

Training Neural Networks with Stochastic Hessian-Free Optimization

arXiv:1301.3641v352 citations
Originality Incremental advance
AI Analysis

This work provides an incremental improvement for machine learning practitioners by bridging stochastic gradient descent and Hessian-free methods to enhance training efficiency.

The paper tackled the problem of efficiently training neural networks by introducing stochastic Hessian-free optimization, which uses mini-batches for gradients and curvature to achieve competitive performance on classification and deep autoencoder tasks.

Hessian-free (HF) optimization has been successfully used for training deep autoencoders and recurrent networks. HF uses the conjugate gradient algorithm to construct update directions through curvature-vector products that can be computed on the same order of time as gradients. In this paper we exploit this property and study stochastic HF with gradient and curvature mini-batches independent of the dataset size. We modify Martens' HF for these settings and integrate dropout, a method for preventing co-adaptation of feature detectors, to guard against overfitting. Stochastic Hessian-free optimization gives an intermediary between SGD and HF that achieves competitive performance on both classification and deep autoencoder experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes