MLCGLGJun 8, 2022

CCP: Correlated Clustering and Projection for Dimensionality Reduction

arXiv:2206.04189v115 citationsh-index: 61
Originality Incremental advance
AI Analysis

This work addresses scalability issues in dimensionality reduction for large, high-dimensional datasets, though it appears incremental as it builds on existing clustering and projection concepts.

The paper tackles the inefficiency of existing dimensionality reduction methods for large datasets with high intrinsic dimensions by introducing Correlated Clustering and Projection (CCP), a novel data domain strategy that avoids matrix diagonalization, and reports validation on benchmark datasets with various machine learning algorithms.

Most dimensionality reduction methods employ frequency domain representations obtained from matrix diagonalization and may not be efficient for large datasets with relatively high intrinsic dimensions. To address this challenge, Correlated Clustering and Projection (CCP) offers a novel data domain strategy that does not need to solve any matrix. CCP partitions high-dimensional features into correlated clusters and then projects correlated features in each cluster into a one-dimensional representation based on sample correlations. Residue-Similarity (R-S) scores and indexes, the shape of data in Riemannian manifolds, and algebraic topology-based persistent Laplacian are introduced for visualization and analysis. Proposed methods are validated with benchmark datasets associated with various machine learning algorithms.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes