Function Preserving Projection for Scalable Exploration of High-Dimensional Data
This addresses the need for interpretable data exploration in high-dimensional datasets, offering a novel approach beyond conventional dimension reduction methods.
The paper tackles the problem of discovering interpretable relationships between user-selected response functions and high-dimensional data by introducing function preserving projections (FPP), a scalable linear projection technique that reveals non-linear patterns and scales to millions of samples.
We present function preserving projections (FPP), a scalable linear projection technique for discovering interpretable relationships in high-dimensional data. Conventional dimension reduction methods aim to maximally preserve the global and/or local geometric structure of a dataset. However, in practice one is often more interested in determining how one or multiple user-selected response function(s) can be explained by the data. To intuitively connect the responses to the data, FPP constructs 2D linear embeddings optimized to reveal interpretable yet potentially non-linear patterns of the response functions. More specifically, FPP is designed to (i) produce human-interpretable embeddings; (ii) capture non-linear relationships; (iii) allow the simultaneous use of multiple response functions; and (iv) scale to millions of samples. Using FPP on real-world datasets, one can obtain fundamentally new insights about high-dimensional relationships in large-scale data that could not be achieved using existing dimension reduction methods.