MSCOMLMay 25, 2018

COREclust: a new package for a robust and scalable analysis of complex data

arXiv:1805.10211v1
Originality Synthesis-oriented
AI Analysis

This provides a robust and scalable tool for researchers analyzing complex, high-dimensional datasets, though it is incremental as it builds on existing clustering methods.

The authors tackled the problem of detecting representative variables in high-dimensional data with limited observations by introducing the COREclust R package, which uses a graph clustering algorithm to identify variable sets and robustly estimate their centers, demonstrating effectiveness on synthetic and real data.

In this paper, we present a new R package COREclust dedicated to the detection of representative variables in high dimensional spaces with a potentially limited number of observations. Variable sets detection is based on an original graph clustering strategy denoted CORE-clustering algorithm that detects CORE-clusters, i.e. variable sets having a user defined size range and in which each variable is very similar to at least another variable. Representative variables are then robustely estimate as the CORE-cluster centers. This strategy is entirely coded in C++ and wrapped by R using the Rcpp package. A particular effort has been dedicated to keep its algorithmic cost reasonable so that it can be used on large datasets. After motivating our work, we will explain the CORE-clustering algorithm as well as a greedy extension of this algorithm. We will then present how to use it and results obtained on synthetic and real data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes