CL IR LGOct 13, 2022

MTEB: Massive Text Embedding Benchmark

Niklas Muennighoff, Nouamane Tazi, Loïc Magne, Nils Reimers

Hugging Face

arXiv:2210.07316v334.9963 citationsh-index: 48Has Code

Originality Highly original

AI Analysis

This provides a foundational benchmark for the ML/AI community to track progress in text embeddings, addressing a critical evaluation gap.

The authors tackled the lack of comprehensive evaluation for text embeddings by introducing the Massive Text Embedding Benchmark (MTEB), which spans 8 tasks, 58 datasets, and 112 languages, and found that no single method dominates across all tasks.

Text embeddings are commonly evaluated on a small set of datasets from a single task not covering their possible applications to other tasks. It is unclear whether state-of-the-art embeddings on semantic textual similarity (STS) can be equally well applied to other tasks like clustering or reranking. This makes progress in the field difficult to track, as various models are constantly being proposed without proper evaluation. To solve this problem, we introduce the Massive Text Embedding Benchmark (MTEB). MTEB spans 8 embedding tasks covering a total of 58 datasets and 112 languages. Through the benchmarking of 33 models on MTEB, we establish the most comprehensive benchmark of text embeddings to date. We find that no particular text embedding method dominates across all tasks. This suggests that the field has yet to converge on a universal text embedding method and scale it up sufficiently to provide state-of-the-art results on all embedding tasks. MTEB comes with open-source code and a public leaderboard at https://github.com/embeddings-benchmark/mteb.

View on arXiv PDF Code

Similar