MLLGAPCOMEJul 14, 2022

Seeking the Truth Beyond the Data. An Unsupervised Machine Learning Approach

arXiv:2207.06949v46 citationsh-index: 8
AI Analysis

This is an incremental review that helps researchers select clustering algorithms for pattern detection in large databases.

This paper reviews and compares widely-used clustering methodologies, evaluating their efficiency on three datasets based on accuracy and complexity to determine their appropriateness for different dataset sizes.

Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together aiming to the construction of well-established clusters that their elements are classified according to their similarity. The goal of this process is to provide a useful aid to the researcher that will help her/him to identify patterns among the data. Dealing with large databases, such patterns may not be easily detectable without the contribution of a clustering algorithm. This article provides a deep description of the most widely used clustering methodologies accompanied by useful presentations concerning suitable parameter selection and initializations. Simultaneously, this article not only represents a review highlighting the major elements of examined clustering techniques but emphasizes the comparison of these algorithms' clustering efficiency based on 3 datasets, revealing their existing weaknesses and capabilities through accuracy and complexity, during the confrontation of discrete and continuous observations. The produced results help us extract valuable conclusions about the appropriateness of the examined clustering techniques in accordance with the dataset's size.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes