LGMLSep 21, 2021

Classification with Nearest Disjoint Centroids

arXiv:2109.10436v2
Originality Incremental advance
AI Analysis

This is an incremental improvement for classification tasks, particularly in domains like gene expression analysis where feature efficiency is important.

The paper tackles classification by proposing a nearest disjoint centroid classifier, which uses disjoint feature subsets and a dimensionality-normalized norm, and demonstrates that it outperforms other classifiers with lower misclassification rates and/or fewer features on simulated and gene expression data.

In this paper, we develop a new classification method based on nearest centroid, and it is called the nearest disjoint centroid classifier. Our method differs from the nearest centroid classifier in the following two aspects: (1) the centroids are defined based on disjoint subsets of features instead of all the features, and (2) the distance is induced by the dimensionality-normalized norm instead of the Euclidean norm. We provide a few theoretical results regarding our method. In addition, we propose a simple algorithm based on adapted k-means clustering that can find the disjoint subsets of features used in our method, and extend the algorithm to perform feature selection. We evaluate and compare the performance of our method to other classification methods on both simulated data and real-world gene expression datasets. The results demonstrate that our method is able to outperform other competing classifiers by having smaller misclassification rates and/or using fewer features in various settings and situations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes