LGAPOct 26, 2020

Data Segmentation via t-SNE, DBSCAN, and Random Forest

arXiv:2010.13682v2
Originality Synthesis-oriented
AI Analysis

This is an incremental proof of concept for data segmentation, potentially useful for analysts working with clustering and feature analysis in domains like social media.

The researchers tackled data segmentation by combining t-SNE, DBSCAN, and Random Forest into an end-to-end pipeline to separate data into natural clusters and profile them based on important features, achieving generalization on real datasets like Iris, MNIST, and Instagram.

This research proposes a data segmentation algorithm which combines t-SNE, DBSCAN, and Random Forest classifier to form an end-to-end pipeline that separates data into natural clusters and produces a characteristic profile of each cluster based on the most important features. Out-of-sample cluster labels can be inferred, and the technique generalizes well on real data sets. We describe the algorithm and provide case studies using the Iris and MNIST data sets, as well as real social media site data from Instagram. This is a proof of concept and sets the stage for further in-depth theoretical analysis.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes