LGDec 2, 2022

Clustering through Feature Space Sequence Discovery and Analysis

arXiv:2212.00996v1h-index: 4
Originality Synthesis-oriented
AI Analysis

This addresses clustering in high-dimensional data for data science applications, but appears incremental as it builds on existing change point analysis theory.

The paper tackles the problem of identifying patterns in high-dimensional data without prior knowledge by proposing DCSA, a nonparametric algorithm that converts data to sequences and performs change point analysis for clustering. Experiments on datasets with 4 to 20531 dimensions show the method is robust and provides visually interpretable results.

Identifying high-dimensional data patterns without a priori knowledge is an important task of data science. This paper proposes a simple and efficient noparametric algorithm: Data Convert to Sequence Analysis, DCSA, which dynamically explore each point in the feature space without repetition, and a Directed Hamilton Path will be found. Based on the change point analysis theory, The sequence corresponding to the path is cut into several fragments to achieve clustering. The experiments on real-world datasets from different fields with dimensions ranging from 4 to 20531 confirm that the method in this work is robust and has visual interpretability in result analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes