ML LGJan 4

Simplex Deep Linear Discriminant Analysis

Maxat Tezekbayev, Arman Bolatov, Zhenisbek Assylbekov

arXiv:2601.01679v11.7

Originality Incremental advance

AI Analysis

This addresses the issue of interpretable and stable deep classification for researchers and practitioners in machine learning, though it is incremental as it builds on existing Deep LDA methods.

The paper tackled the problem of training Deep Linear Discriminant Analysis (Deep LDA) models by maximum likelihood estimation, which can lead to degenerate solutions and poor classification performance; by constraining the class means to a regular simplex and using a spherical covariance, they achieved stable training and competitive accuracy on image datasets like Fashion-MNIST, CIFAR-10, and CIFAR-100.

We revisit Deep Linear Discriminant Analysis (Deep LDA) from a likelihood-based perspective. While classical LDA is a simple Gaussian model with linear decision boundaries, attaching an LDA head to a neural encoder raises the question of how to train the resulting deep classifier by maximum likelihood estimation (MLE). We first show that end-to-end MLE training of an unconstrained Deep LDA model ignores discrimination: when both the LDA parameters and the encoder parameters are learned jointly, the likelihood admits a degenerate solution in which some of the class clusters may heavily overlap or even collapse, and classification performance deteriorates. Batchwise moment re-estimation of the LDA parameters does not remove this failure mode. We then propose a constrained Deep LDA formulation that fixes the class means to the vertices of a regular simplex in the latent space and restricts the shared covariance to be spherical, leaving only the priors and a single variance parameter to be learned along with the encoder. Under these geometric constraints, MLE becomes stable and yields well-separated class clusters in the latent space. On images (Fashion-MNIST, CIFAR-10, CIFAR-100), the resulting Deep LDA models achieve accuracy competitive with softmax baselines while offering a simple, interpretable latent geometry that is clearly visible in two-dimensional projections.

View on arXiv PDF

Similar