MLApr 6, 2017

Massive Data Clustering in Moderate Dimensions from the Dual Spaces of Observation and Attribute Data Clouds

arXiv:1704.01871v1

Originality Synthesis-oriented

AI Analysis

This work addresses clustering challenges for large-scale data in moderate dimensions, which is incremental as it builds on existing multivariate analytics methodologies.

The paper tackles the problem of clustering massive datasets with moderate to small dimensionality by leveraging dual spaces of observations and attributes, resulting in an efficient processing pipeline for both partitioning and hierarchical clustering.

Cluster analysis of very high dimensional data can benefit from the properties of such high dimensionality. Informally expressed, in this work, our focus is on the analogous situation when the dimensionality is moderate to small, relative to a massively sized set of observations. Mathematically expressed, these are the dual spaces of observations and attributes. The point cloud of observations is in attribute space, and the point cloud of attributes is in observation space. In this paper, we begin by summarizing various perspectives related to methodologies that are used in multivariate analytics. We draw on these to establish an efficient clustering processing pipeline, both partitioning and hierarchical clustering.

View on arXiv PDF

Similar