CY LG MLNov 28, 2018

A comparison of cluster algorithms as applied to unsupervised surveys

Kathleen Campbell Garwood, Ph. D., Arpit Arun Dhobale

arXiv:1811.12210v21.21 citations

Originality Synthesis-oriented

AI Analysis

This is an incremental comparison of existing clustering methods applied to new survey data for identifying student poverty levels.

The study compared k-means, k-modes, and hierarchical clustering algorithms on unsupervised student survey data to identify impoverished students, finding that each method had strengths but no single best approach was specified.

When considering answering important questions with data, unsupervised data offers extensive insight opportunity and unique challenges. This study considers student survey data with a specific goal of clustering students into like groups with underlying concept of identifying different poverty levels. Fuzzy logic is considered during the data cleaning and organizing phase helping to create a logical dependent variable for analysis comparison. Using multiple data reduction techniques, the survey was reduced and cleaned. Finally, multiple clustering techniques (k-means, k-modes, and hierarchical clustering) are applied and compared. Though each method has strengths, the goal was to identify which was most viable when applied to survey data and specifically when trying to identify the most impoverished students.

View on arXiv PDF

Similar