LGJun 24, 2021

A review of systematic selection of clustering algorithms and their evaluation

arXiv:2106.12792v117 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of algorithm selection for users in data analysis, but it is incremental as it builds on existing literature to systematize the process.

The paper tackles the challenge of selecting appropriate clustering algorithms and validation methods for complex data by providing a systematic selection logic and assessment criteria based on a literature review, enabling users to choose algorithms that fit their data properties.

Data analysis plays an indispensable role for value creation in industry. Cluster analysis in this context is able to explore given datasets with little or no prior knowledge and to identify unknown patterns. As (big) data complexity increases in the dimensions volume, variety, and velocity, this becomes even more important. Many tools for cluster analysis have been developed from early on and the variety of different clustering algorithms is huge. As the selection of the right clustering procedure is crucial to the results of the data analysis, users are in need for support on their journey of extracting knowledge from raw data. Thus, the objective of this paper lies in the identification of a systematic selection logic for clustering algorithms and corresponding validation concepts. The goal is to enable potential users to choose an algorithm that fits best to their needs and the properties of their underlying data clustering problem. Moreover, users are supported in selecting the right validation concepts to make sense of the clustering results. Based on a comprehensive literature review, this paper provides assessment criteria for clustering method evaluation and validation concept selection. The criteria are applied to several common algorithms and the selection process of an algorithm is supported by the introduction of pseudocode-based routines that consider the underlying data structure.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes