LGMLAug 4, 2014

Multithreshold Entropy Linear Classifier

arXiv:1408.1054v128 citations
Originality Incremental advance
AI Analysis

This work addresses classification tasks, particularly in cheminformatics, by improving balanced quality measures like Matthew's Correlation Coefficient over accuracy, though it is incremental as it builds on existing linear classifier paradigms.

The paper tackles the problem of linear classification by introducing a multithreshold linear classifier that uses multiple parallel hyperplanes, based on Renyi's quadratic entropy and Cauchy-Schwarz divergence. It achieves similar or higher scores than SVM on synthetic and real UCI datasets, with benefits demonstrated in cheminformatics for ligands activity prediction.

Linear classifiers separate the data with a hyperplane. In this paper we focus on the novel method of construction of multithreshold linear classifier, which separates the data with multiple parallel hyperplanes. Proposed model is based on the information theory concepts -- namely Renyi's quadratic entropy and Cauchy-Schwarz divergence. We begin with some general properties, including data scale invariance. Then we prove that our method is a multithreshold large margin classifier, which shows the analogy to the SVM, while in the same time works with much broader class of hypotheses. What is also interesting, proposed method is aimed at the maximization of the balanced quality measure (such as Matthew's Correlation Coefficient) as opposed to very common maximization of the accuracy. This feature comes directly from the optimization problem statement and is further confirmed by the experiments on the UCI datasets. It appears, that our Multithreshold Entropy Linear Classifier (MELC) obtaines similar or higher scores than the ones given by SVM on both synthetic and real data. We show how proposed approach can be benefitial for the cheminformatics in the task of ligands activity prediction, where despite better classification results, MELC gives some additional insight into the data structure (classes of underrepresented chemical compunds).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes