LG MLMar 2, 2021

Factoring out prior knowledge from low-dimensional embeddings

Edith Heiter, Jonas Fischer, Jilles Vreeken

arXiv:2103.01828v13.11 citations

Originality Incremental advance

AI Analysis

This work addresses the need for more informative visualizations in data analysis by accounting for existing knowledge, though it is incremental as it builds on established embedding techniques like tSNE and UMAP.

The paper tackles the problem of visualizing high-dimensional data with low-dimensional embeddings that incorporate prior knowledge, proposing two methods (JEDI and CONFETTI) to factor out known information, which successfully reveal hidden meaningful structures in synthetic and real-world data.

Low-dimensional embedding techniques such as tSNE and UMAP allow visualizing high-dimensional data and therewith facilitate the discovery of interesting structure. Although they are widely used, they visualize data as is, rather than in light of the background knowledge we have about the data. What we already know, however, strongly determines what is novel and hence interesting. In this paper we propose two methods for factoring out prior knowledge in the form of distance matrices from low-dimensional embeddings. To factor out prior knowledge from tSNE embeddings, we propose JEDI that adapts the tSNE objective in a principled way using Jensen-Shannon divergence. To factor out prior knowledge from any downstream embedding approach, we propose CONFETTI, in which we directly operate on the input distance matrices. Extensive experiments on both synthetic and real world data show that both methods work well, providing embeddings that exhibit meaningful structure that would otherwise remain hidden.

View on arXiv PDF

Similar