LGCGNov 9, 2021

A Topological Data Analysis Based Classifier

arXiv:2111.05214v310 citations
Originality Incremental advance
AI Analysis

This addresses classification challenges, particularly for imbalanced datasets, by introducing a novel TDA-based approach, though it is incremental as it builds on existing TDA tools.

The paper tackles multi-class classification by directly applying Topological Data Analysis without additional machine learning stages, showing that the proposed TDABC method outperforms baseline classifiers like KNN and weighted-KNN on average, and excels in classifying entangled and minority classes in imbalanced datasets.

Topological Data Analysis (TDA) is an emergent field that aims to discover topological information hidden in a dataset. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes an algorithm that applies TDA directly to multi-class classification problems, without any further ML stage, showing advantages for imbalanced datasets. The proposed algorithm builds a filtered simplicial complex on the dataset. Persistent Homology (PH) is applied to guide the selection of a sub-complex where unlabeled points obtain the label with the majority of votes from labeled neighboring points. We select 8 datasets with different dimensions, degrees of class overlap and imbalanced samples per class. On average, the proposed TDABC method was better than KNN and weighted-KNN. It behaves competitively with Local SVM and Random Forest baseline classifiers in balanced datasets, and it outperforms all baseline methods classifying entangled and minority classes.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes