AIDBIRMay 29

Vector Linking via Cross-Model Local Isometric Consistency

arXiv:2605.3110024.2Has Code
Predicted impact top 19% in AI · last 90 daysOriginality Highly original
AI Analysis

This work is significant for users of vector databases and those performing cross-model clustering, as it enables the integration of different embedding spaces without direct access to the models or training data.

This paper addresses Vector Linking, the problem of finding correspondences between object embeddings generated by different black-box encoders from partially overlapping datasets. The authors propose an iterative, reference-based geometric embedding hashing method that leverages local geometric consistency between independently trained contrastive encoders to recover vector links from a small initial set of paired anchors.

We study Vector Linking: given two embedding clouds produced by different black-box encoders over partially overlapping datasets, recover cross-model object correspondences using only vectors. Empirically and theoretically, we show that independently trained contrastive encoders exhibit local geometric consistency: short-range distances are approximately preserved up to a scale factor, while long-range distances are not due to model-specific distortion. Building on this, we propose an iterative, reference-based geometric embedding hashing that recovers vector links from a tiny seed set of paired anchors. It represents each vector by distances to sampled paired anchors, proposes candidate links via hash-space matching, and aggregates evidence across views in a Beta-Bernoulli posterior to bootstrap high-confidence links as new anchors. Experiments across multiple benchmarks and embedding model pairs demonstrate accurate and robust linking under varying overlap, seed budgets, and out-of-domain anchors, with applications to vector database integration and cross-model clustering. Code is available at https://github.com/DBgroup-Edinburgh/VecLinking.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes