QMLGMEAug 16, 2016

A Data-Driven Approach to Estimating the Number of Clusters in Hierarchical Clustering

arXiv:1608.04700v158 citations
Originality Incremental advance
AI Analysis

This provides an incremental improvement for researchers in fields like bioinformatics who need automated clustering analysis.

The paper tackled the problem of automatically estimating the number of clusters in hierarchical clustering without human input, proposing two data-driven methods that outperformed the Gap statistic and Elbow methods in multi-cluster scenarios on simulated and gene expression data.

We propose two new methods for estimating the number of clusters in a hierarchical clustering framework in the hopes of creating a fully automated process with no human intervention. The methods are completely data-driven and require no input from the researcher, and as such are fully automated. They are quite easy to implement and not computationally intensive in the least. We analyze performance on several simulated data sets and the Biobase Gene Expression Set, comparing our methods to the established Gap statistic and Elbow methods and outperforming both in multi-cluster scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes