mini-vec2vec: Scaling Universal Geometry Alignment with Linear Transformations
This work addresses the computational expense and instability in geometry alignment for text embeddings, offering a scalable solution for domains requiring efficient embedding space alignment.
The paper tackles the problem of aligning text embedding spaces without parallel data by introducing mini-vec2vec, a more efficient and robust linear transformation method that matches or exceeds the performance of the original vec2vec while reducing computational cost by orders of magnitude.
We build upon vec2vec, a procedure designed to align text embedding spaces without parallel data. vec2vec finds a near-perfect alignment, but it is expensive and unstable. We present mini-vec2vec, a simple and efficient alternative that requires substantially lower computational cost and is highly robust. Moreover, the learned mapping is a linear transformation. Our method consists of three main stages: a tentative matching of pseudo-parallel embedding vectors, transformation fitting, and iterative refinement. Our linear alternative exceeds the original instantiation of vec2vec by orders of magnitude in efficiency, while matching or exceeding their results. The method's stability and interpretable algorithmic steps facilitate scaling and unlock new opportunities for adoption in new domains and fields.