IRMay 13

Granite Embedding Multilingual R2 Models

arXiv:2605.1352134.0Has Code
Predicted impact top 1% in IR · last 90 daysOriginality Incremental advance
AI Analysis

This work provides enterprise-grade multilingual embedding models with strong performance, addressing the need for efficient and effective retrieval across diverse languages and domains.

The paper introduces multilingual Granite Embedding R2 models for dense retrieval across 200+ languages, achieving state-of-the-art performance on multilingual, code, long-document, and reasoning retrieval tasks. The compact 97M-parameter model achieves the highest retrieval score among open multilingual models under 100M parameters.

We introduce the multilingual Granite Embedding R2 models, a family of encoder-based embedding models for enterprise-scale dense retrieval across 200+ languages. Extending our English-focused R2 release, these models add enhanced support for 52 languages and programming code, a 32,768-token context window (a 64x expansion over R1), and state-of-the-art overall performance across multilingual and cross-lingual text search, code retrieval, long-document search, and reasoning retrieval datasets. The release consists of two bi-encoder models based on the ModernBERT architecture with an expanded multilingual vocabulary: a 311M-parameter full-size, and a 97M-parameter compact model built via model pruning and vocabulary selection that achieves the highest retrieval score of any open multilingual embedding model under 100M parameters. The full-size also supports Matryoshka Representation Learning for flexible embedding dimensionality. Both models are trained on enterprise-appropriate data with governance oversight, and released under the Apache 2.0 license at https://huggingface.co/collections/ibm-granite, designed to support responsible use and enable unrestricted research and enterprise adoption.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes