Classification based on Topological Data Analysis
This work provides a new, direct TDA-based classification method for researchers and practitioners dealing with complex, imbalanced multi-class datasets, offering an alternative to traditional ML approaches.
This paper introduces a novel algorithm, TDABC, that directly applies Topological Data Analysis (TDA) to multi-class classification problems, including imbalanced datasets. The method constructs a filtered simplicial complex and uses persistent homology to guide labeling unlabeled points based on votes from labeled neighbors, outperforming baseline k-NN and wk-NN classifiers on 8 datasets, particularly for entangled and minority classes.
Topological Data Analysis (TDA) is an emergent field that aims to discover topological information hidden in a dataset. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes an algorithm that applies TDA directly to multi-class classification problems, even imbalanced datasets, without any further ML stage. The proposed algorithm built a filtered simplicial complex on the dataset. Persistent homology is then applied to guide choosing a sub-complex where unlabeled points obtain the label with most votes from labeled neighboring points. To assess the proposed method, 8 datasets were selected with several degrees of class entanglement, variability on the samples per class, and dimensionality. On average, the proposed TDABC method was capable of overcoming baseline classifiers (wk-NN and k-NN) in each of the computed metrics, especially on classifying entangled and minority classes.