MLCVLGNov 20, 2017

Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks

arXiv:1711.07354v179 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of non-convex optimization in deep learning for researchers and practitioners, offering a theoretically grounded alternative to SGD, though it appears incremental as it builds on existing regularization and optimization techniques.

The paper tackles the problem of training deep neural networks by proposing a smooth multi-convex formulation via lifting ReLU into a higher-dimensional space, resulting in a block coordinate descent algorithm with proven global convergence and R-linear rate. In experiments on MNIST, this method consistently achieved better test-set error rates than SGD variants in Caffe.

By lifting the ReLU function into a higher dimensional space, we develop a smooth multi-convex formulation for training feed-forward deep neural networks (DNNs). This allows us to develop a block coordinate descent (BCD) training algorithm consisting of a sequence of numerically well-behaved convex optimizations. Using ideas from proximal point methods in convex analysis, we prove that this BCD algorithm will converge globally to a stationary point with R-linear convergence rate of order one. In experiments with the MNIST database, DNNs trained with this BCD algorithm consistently yielded better test-set error rates than identical DNN architectures trained via all the stochastic gradient descent (SGD) variants in the Caffe toolbox.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes