CLOct 23, 2023

On the Dimensionality of Sentence Embeddings

arXiv:2310.15285v1134 citationsh-index: 17
Originality Incremental advance
AI Analysis

This work addresses a fundamental but underexplored aspect of sentence embedding learning, offering practical improvements for NLP applications, though it is incremental in nature.

The paper tackles the problem of determining the optimal dimensionality for sentence embeddings, showing that it is often smaller than default values, and proposes a two-step training method that significantly improves performance in low-dimensional scenarios, as demonstrated on seven STS and seven classification tasks.

Learning sentence embeddings is a fundamental problem in natural language processing. While existing research primarily focuses on enhancing the quality of sentence embeddings, the exploration of sentence embedding dimensions is limited. Here we present a comprehensive and empirical analysis of the dimensionality of sentence embeddings. First, we demonstrate that the optimal dimension of sentence embeddings is usually smaller than the default value. Subsequently, to compress the dimension of sentence embeddings with minimum performance degradation, we identify two components contributing to the overall performance loss: the encoder's performance loss and the pooler's performance loss. Therefore, we propose a two-step training method for sentence representation learning models, wherein the encoder and the pooler are optimized separately to mitigate the overall performance loss in low-dimension scenarios. Experimental results on seven STS tasks and seven sentence classification tasks demonstrate that our method significantly improves the performance of low-dimensional sentence embeddings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes