LGCVMar 17, 2025

A Convex formulation for linear discriminant analysis

arXiv:2503.13623v1
Originality Incremental advance
AI Analysis

This method addresses the need for scalable and reliable dimensionality reduction in high-dimensional data analysis, such as in bioinformatics and computer vision, though it appears incremental relative to traditional LDA approaches.

The authors tackled the problem of supervised dimensionality reduction by proposing Convex Linear Discriminant Analysis (ConvexLDA), which optimizes a convex cost function to balance class cohesion and separation, resulting in improved performance over existing LDA-based methods on high-dimensional biological and image datasets.

We present a supervised dimensionality reduction technique called Convex Linear Discriminant Analysis (ConvexLDA). The proposed model optimizes a multi-objective cost function by balancing two complementary terms. The first term pulls the samples of a class towards its centroid by minimizing a sample's distance from its class-centroid in low dimensional space. The second term pushes the classes far apart by maximizing their hyperellipsoid scattering volume via the logarithm of the determinant (\textit{log det}) of the outer product matrix formed by the low-dimensional class-centroids. Using the negative of the \textit{log det}, we pose the final cost as a minimization problem, which balances the two terms using a hyper-parameter $λ$. We demonstrate that the cost function is convex. Unlike Fisher LDA, the proposed method doesn't require to compute the inverse of a matrix, hence avoiding any ill-conditioned problem where data dimension is very high, e.g. RNA-seq data. ConvexLDA doesn't require pair-wise distance calculation, making it faster and more easily scalable. Moreover, the convex nature of the cost function ensures global optimality, enhancing the reliability of the learned embedding. Our experimental evaluation demonstrates that ConvexLDA outperforms several popular linear discriminant analysis (LDA)-based methods on a range of high-dimensional biological data, image data sets, etc.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes