ML LG COMay 10, 2025

Out-of-Sample Embedding with Proximity Data: Projection versus Restricted Reconstruction

Michael W. Trosset, Kaiyi Tan, Minh Tang, Carey E. Priebe

arXiv:2505.06756v14.5h-index: 43

Originality Synthesis-oriented

AI Analysis

This is an incremental survey that organizes existing methods for embedding new points into vector diagrams, relevant for researchers in dimensionality reduction and data analysis.

The paper surveys kernel methods for out-of-sample embedding using proximity data, categorizing them into two strategies: projection and restricted reconstruction, with the latter simplified to a unidimensional search.

The problem of using proximity (similarity or dissimilarity) data for the purpose of "adding a point to a vector diagram" was first studied by J.C. Gower in 1968. Since then, a number of methods -- mostly kernel methods -- have been proposed for solving what has come to be called the problem of *out-of-sample embedding*. We survey the various kernel methods that we have encountered and show that each can be derived from one or the other of two competing strategies: *projection* or *restricted reconstruction*. Projection can be analogized to a well-known formula for adding a point to a principal component analysis. Restricted reconstruction poses a different challenge: how to best approximate redoing the entire multivariate analysis while holding fixed the vector diagram that was previously obtained. This strategy results in a nonlinear optimization problem that can be simplified to a unidimensional search. Various circumstances may warrant either projection or restricted reconstruction.

View on arXiv PDF

Similar