ML LGMay 18, 2018

Spectral feature scaling method for supervised dimensionality reduction

Momo Matsuda, Keiichi Morikuni, Tetsuya Sakurai

arXiv:1805.07006v11.98 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving classification accuracy in high-dimensional data like gene expression profiles, though it appears incremental as it builds on existing spectral clustering techniques.

The authors tackled the problem of spectral dimensionality reduction methods not always achieving desired classification due to data irregularities by proposing a supervised method that modifies feature scales using prior label knowledge. The method outperformed existing supervised methods in toy problems and real-world gene expression data, showing improved clustering and classification accuracies as training data increased.

Spectral dimensionality reduction methods enable linear separations of complex data with high-dimensional features in a reduced space. However, these methods do not always give the desired results due to irregularities or uncertainties of the data. Thus, we consider aggressively modifying the scales of the features to obtain the desired classification. Using prior knowledge on the labels of partial samples to specify the Fiedler vector, we formulate an eigenvalue problem of a linear matrix pencil whose eigenvector has the feature scaling factors. The resulting factors can modify the features of entire samples to form clusters in the reduced space, according to the known labels. In this study, we propose new dimensionality reduction methods supervised using the feature scaling associated with the spectral clustering. Numerical experiments show that the proposed methods outperform well-established supervised methods for toy problems with more samples than features, and are more robust regarding clustering than existing methods. Also, the proposed methods outperform existing methods regarding classification for real-world problems with more features than samples of gene expression profiles of cancer diseases. Furthermore, the feature scaling tends to improve the clustering and classification accuracies of existing unsupervised methods, as the proportion of training data increases.

View on arXiv PDF

Similar