AI PF NASep 14, 2017

Fast semi-supervised discriminant analysis for binary classification of large data-sets

Joris Tavernier, Jaak Simm, Karl Meerbergen, Joerg Kurt Wegner, Hugo Ceulemans, Yves Moreau

arXiv:1709.04794v213 citations

AI Analysis

This work addresses scalable binary classification for large datasets in domains like pharmaceuticals, but it appears incremental as it builds on existing semi-supervised discriminant analysis methods.

The authors tackled the problem of high-dimensional data classification by proposing three scalable semi-supervised discriminant analysis algorithms, which achieved good predictive performance and reduced computation time to a few seconds on an industry-scale pharmaceutical dataset.

High-dimensional data requires scalable algorithms. We propose and analyze three scalable and related algorithms for semi-supervised discriminant analysis (SDA). These methods are based on Krylov subspace methods which exploit the data sparsity and the shift-invariance of Krylov subspaces. In addition, the problem definition was improved by adding centralization to the semi-supervised setting. The proposed methods are evaluated on a industry-scale data set from a pharmaceutical company to predict compound activity on target proteins. The results show that SDA achieves good predictive performance and our methods only require a few seconds, significantly improving computation time on previous state of the art.

View on arXiv PDF

Similar