LGQMMEDec 23, 2021

Ensemble Method for Cluster Number Determination and Algorithm Selection in Unsupervised Learning

arXiv:2112.13680v1
Originality Incremental advance
AI Analysis

This addresses the challenge for researchers in unsupervised learning who need to make informed decisions before applying clustering to their data, though it appears incremental as it builds on existing ensemble methods.

The paper tackles the problem of requiring expertise to select clustering algorithms and hyperparameters, including determining the number of clusters, by proposing an ensemble clustering framework that automates these decisions with minimal input, resulting in a method that can determine cluster numbers and select suitable algorithms for datasets.

Unsupervised learning, and more specifically clustering, suffers from the need for expertise in the field to be of use. Researchers must make careful and informed decisions on which algorithm to use with which set of hyperparameters for a given dataset. Additionally, researchers may need to determine the number of clusters in the dataset, which is unfortunately itself an input to most clustering algorithms. All of this before embarking on their actual subject matter work. After quantifying the impact of algorithm and hyperparameter selection, we propose an ensemble clustering framework which can be leveraged with minimal input. It can be used to determine both the number of clusters in the dataset and a suitable choice of algorithm to use for a given dataset. A code library is included in the Conclusion for ease of integration.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes