Probabilistic classifiers with low rank indefinite kernels
This work addresses scalability issues for researchers and practitioners handling large datasets with indefinite similarity measures, such as in bioinformatics and image retrieval, though it is incremental as it extends existing methods.
The authors tackled the problem of scaling probabilistic classifiers for indefinite similarity data, achieving linear runtime and memory complexity for low-rank indefinite kernels while maintaining similar generalization performance.
Indefinite similarity measures can be frequently found in bio-informatics by means of alignment scores, but are also common in other fields like shape measures in image retrieval. Lacking an underlying vector space, the data are given as pairwise similarities only. The few algorithms available for such data do not scale to larger datasets. Focusing on probabilistic batch classifiers, the Indefinite Kernel Fisher Discriminant (iKFD) and the Probabilistic Classification Vector Machine (PCVM) are both effective algorithms for this type of data but, with cubic complexity. Here we propose an extension of iKFD and PCVM such that linear runtime and memory complexity is achieved for low rank indefinite kernels. Employing the Nyström approximation for indefinite kernels, we also propose a new almost parameter free approach to identify the landmarks, restricted to a supervised learning problem. Evaluations at several larger similarity data from various domains show that the proposed methods provides similar generalization capabilities while being easier to parametrize and substantially faster for large scale data.