LGMLAug 13, 2024

AuToMATo: An Out-Of-The-Box Persistence-Based Clustering Algorithm

arXiv:2408.06958v33 citationsh-index: 2Has Code
Originality Incremental advance
AI Analysis

This provides an out-of-the-box clustering solution for applications like topological data analysis, where parameter tuning is undesirable, though it is incremental as it builds on existing methods.

The authors tackled the problem of parameter tuning in clustering algorithms by developing AuToMATo, a persistence-based method with default parameters that performs well across various datasets, often outperforming other state-of-the-art algorithms even with optimized parameters.

We present AuToMATo, a novel clustering algorithm based on persistent homology. While AuToMATo is not parameter-free per se, we provide default choices for its parameters that make it into an out-of-the-box clustering algorithm that performs well across the board. AuToMATo combines the existing ToMATo clustering algorithm with a bootstrapping procedure in order to separate significant peaks of an estimated density function from non-significant ones. We perform a thorough comparison of AuToMATo (with its parameters fixed to their defaults) against many other state-of-the-art clustering algorithms. We find not only that AuToMATo compares favorably against parameter-free clustering algorithms, but in many instances also significantly outperforms even the best selection of parameters for other algorithms. AuToMATo is motivated by applications in topological data analysis, in particular the Mapper algorithm, where it is desirable to work with a clustering algorithm that does not need tuning of its parameters. Indeed, we provide evidence that AuToMATo performs well when used with Mapper. Finally, we provide an open-source implementation of AuToMATo in Python that is fully compatible with the standard scikit-learn architecture.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes