LGMLFeb 21, 2019

Deep Learning Multidimensional Projections

arXiv:1902.07958v177 citations
Originality Incremental advance
AI Analysis

This addresses efficiency and usability problems for researchers and practitioners in machine learning and data visualization dealing with large datasets.

The paper tackles the computational expense, stability issues, and lack of out-of-sample handling in t-SNE and similar dimensionality reduction methods by proposing a deep learning approach that trains a neural network on sample projections, resulting in projections two to three orders of magnitude faster with stable out-of-sample performance.

Dimensionality reduction methods, also known as projections, are frequently used for exploring multidimensional data in machine learning, data science, and information visualization. Among these, t-SNE and its variants have become very popular for their ability to visually separate distinct data clusters. However, such methods are computationally expensive for large datasets, suffer from stability problems, and cannot directly handle out-of-sample data. We propose a learning approach to construct such projections. We train a deep neural network based on a collection of samples from a given data universe, and their corresponding projections, and next use the network to infer projections of data from the same, or similar, universes. Our approach generates projections with similar characteristics as the learned ones, is computationally two to three orders of magnitude faster than SNE-class methods, has no complex-to-set user parameters, handles out-of-sample data in a stable manner, and can be used to learn any projection technique. We demonstrate our proposal on several real-world high dimensional datasets from machine learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes