LGMLJun 6, 2024

Enhancing Supervised Visualization through Autoencoder and Random Forest Proximities for Out-of-Sample Extension

arXiv:2406.04421v1
Originality Incremental advance
AI Analysis

This provides a semi-supervised method for extending embeddings to unseen data, addressing a bottleneck in visualization for domains like bioinformatics, though it is incremental as it builds on RF-PHATE.

The paper tackles the problem of out-of-sample extension for supervised dimensionality reduction by combining random forest proximities with autoencoders, achieving a 40% reduction in training time and consistent quality with only 10% of training data.

The value of supervised dimensionality reduction lies in its ability to uncover meaningful connections between data features and labels. Common dimensionality reduction methods embed a set of fixed, latent points, but are not capable of generalizing to an unseen test set. In this paper, we provide an out-of-sample extension method for the random forest-based supervised dimensionality reduction method, RF-PHATE, combining information learned from the random forest model with the function-learning capabilities of autoencoders. Through quantitative assessment of various autoencoder architectures, we identify that networks that reconstruct random forest proximities are more robust for the embedding extension problem. Furthermore, by leveraging proximity-based prototypes, we achieve a 40% reduction in training time without compromising extension quality. Our method does not require label information for out-of-sample points, thus serving as a semi-supervised method, and can achieve consistent quality using only 10% of the training data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes