CL LGFeb 22, 2024

2D Matryoshka Sentence Embeddings

Xianming Li, Zongxi Li, Jing Li, Haoran Xie, Qing Li

arXiv:2402.14776v34.88 citationsh-index: 11Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of computational constraints in sentence embedding applications for NLP practitioners, though it is incremental over prior Matryoshka Representation Learning.

The paper tackles the inflexibility of fixed-length sentence embeddings by introducing 2D Matryoshka Sentence Embeddings (2DMSE), which supports elastic settings for both embedding sizes and Transformer layers, achieving competitive performance on semantic textual similarity tasks while improving efficiency.

Common approaches rely on fixed-length embedding vectors from language models as sentence embeddings for downstream tasks such as semantic textual similarity (STS). Such methods are limited in their flexibility due to unknown computational constraints and budgets across various applications. Matryoshka Representation Learning (MRL) \cite{aditya2022matryoshka} encodes information at finer granularities, i.e., with lower embedding dimensions, to adaptively accommodate \emph{ad hoc} tasks. Similar accuracy can be achieved with a smaller embedding size, leading to speedups in downstream tasks. Despite its improved efficiency, MRL still requires traversing all Transformer layers before obtaining the embedding, which remains the dominant factor in time and memory consumption. This prompts consideration of whether the fixed number of Transformer layers affects representation quality and whether using intermediate layers for sentence representation is feasible. In this paper, we introduce a novel sentence embedding model called \textit{Two-dimensional Matryoshka Sentence Embedding} (2DMSE)\footnote{Our code is available at \url{https://github.com/SeanLee97/AnglE/blob/main/README_2DMSE.md}.}. It supports elastic settings for both embedding sizes and Transformer layers, offering greater flexibility and efficiency than MRL. We conduct extensive experiments on STS tasks and downstream applications. The experimental results demonstrate the effectiveness of our proposed model in dynamically supporting different embedding sizes and Transformer layers, allowing it to be highly adaptable to various scenarios.

View on arXiv PDF Code

Similar