LGSTDec 8, 2020

On the existence of the maximum likelihood estimate and convergence rate under gradient descent for multi-class logistic regression

arXiv:2012.04576v5
AI Analysis

This work provides theoretical guarantees and convergence rate analysis for multi-class logistic regression, which is important for researchers and practitioners using this fundamental machine learning model.

This paper addresses the existence of the maximum likelihood estimate (MLE) for multi-class logistic regression, showing it can be ensured by assigning positive probability to every class in the sample dataset, without requiring data separability. It also provides a constructive estimate of the convergence rate to the MLE under gradient descent by bounding the Hessian's condition number.

We revisit the problem of the existence of the maximum likelihood estimate for multi-class logistic regression. We show that one method of ensuring its existence is by assigning positive probability to every class in the sample dataset. The notion of data separability is not needed, which is in contrast to the classical set up of multi-class logistic regression in which each data sample belongs to one class. We also provide a general and constructive estimate of the convergence rate to the maximum likelihood estimate when gradient descent is used as the optimizer. Our estimate involves bounding the condition number of the Hessian of the maximum likelihood function. The approaches used in this article rely on a simple operator-theoretic framework.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes