LGOCMLMay 26, 2014

The role of dimensionality reduction in linear classification

arXiv:1405.6444v15 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of improving classification performance and efficiency for machine learning practitioners by jointly optimizing dimensionality reduction and classification, though it is incremental as it builds on existing methods like RBF and SVM.

The paper tackles the joint optimization of nonlinear dimensionality reduction and a linear classifier, which is typically a difficult nonconvex problem, by introducing an efficient algorithm using auxiliary coordinates; it achieves classification errors competitive with state-of-the-art methods while being fast and allowing runtime-accuracy trade-offs.

Dimensionality reduction (DR) is often used as a preprocessing step in classification, but usually one first fixes the DR mapping, possibly using label information, and then learns a classifier (a filter approach). Best performance would be obtained by optimizing the classification error jointly over DR mapping and classifier (a wrapper approach), but this is a difficult nonconvex problem, particularly with nonlinear DR. Using the method of auxiliary coordinates, we give a simple, efficient algorithm to train a combination of nonlinear DR and a classifier, and apply it to a RBF mapping with a linear SVM. This alternates steps where we train the RBF mapping and a linear SVM as usual regression and classification, respectively, with a closed-form step that coordinates both. The resulting nonlinear low-dimensional classifier achieves classification errors competitive with the state-of-the-art but is fast at training and testing, and allows the user to trade off runtime for classification accuracy easily. We then study the role of nonlinear DR in linear classification, and the interplay between the DR mapping, the number of latent dimensions and the number of classes. When trained jointly, the DR mapping takes an extreme role in eliminating variation: it tends to collapse classes in latent space, erasing all manifold structure, and lay out class centroids so they are linearly separable with maximum margin.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes