MLLGMar 29, 2019

A proof of convergence of multi-class logistic regression network

arXiv:1903.12600v47 citations
Originality Synthesis-oriented
AI Analysis

This work offers incremental theoretical insights for researchers in machine learning by formalizing convergence properties of a widely used neural network component.

The paper provides a mathematically rigorous derivation of the gradient for multi-class logistic regression networks and proves the positivity of the second derivative of the cross-entropy loss, enabling the use of convex optimization methods and eliminating the need for L2-regularization to guarantee convergence.

This paper revisits the special type of a neural network known under two names. In the statistics and machine learning community it is known as a multi-class logistic regression neural network. In the neural network community, it is simply the soft-max layer. The importance is underscored by its role in deep learning: as the last layer, whose autput is actually the classification of the input patterns, such as images. Our exposition focuses on mathematically rigorous derivation of the key equation expressing the gradient. The fringe benefit of our approach is a fully vectorized expression, which is a basis of an efficient implementation. The second result of this paper is the positivity of the second derivative of the cross-entropy loss function as function of the weights. This result proves that optimization methods based on convexity may be used to train this network. As a corollary, we demonstrate that no $L^2$-regularizer is needed to guarantee convergence of gradient descent.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes