MLAug 17, 2016

Clustering Mixed Datasets Using Homogeneity Analysis with Applications to Big Data

arXiv:1608.04961v31 citations
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of clustering mixed-type data in big data applications, but it appears incremental as it applies an existing method to a common data type.

The paper tackled the problem of clustering datasets with mixed numerical and categorical attributes by using homogeneity analysis to create a Euclidean representation, and experiments indicated this approach is useful for analyzing big datasets.

Datasets with a mixture of numerical and categorical attributes are routinely encountered in many application domains. In this work we examine an approach to clustering such datasets using homogeneity analysis. Homogeneity analysis determines a euclidean representation of the data. This can be analyzed by leveraging the large body of tools and techniques for data with a euclidean representation. Experiments conducted as part of this study suggest that this approach can be useful in the analysis and exploration of big datasets with a mixture of numerical and categorical attributes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes