CLOct 9, 2025

Multilingual Generative Retrieval via Cross-lingual Semantic Compression

arXiv:2510.07812v13 citationsh-index: 14EMNLP
Originality Incremental advance
AI Analysis

This addresses the problem of efficient and accurate multilingual retrieval for users and systems, representing a strong incremental improvement over existing methods.

The paper tackled the challenges of cross-lingual identifier misalignment and inflation in multilingual generative retrieval by proposing MGR-CSC, which improved retrieval accuracy by 6.83% on mMarco100k and 4.77% on mNQ320k while reducing identifier length by 74.51% and 78.2%.

Generative Information Retrieval is an emerging retrieval paradigm that exhibits remarkable performance in monolingual scenarios.However, applying these methods to multilingual retrieval still encounters two primary challenges, cross-lingual identifier misalignment and identifier inflation. To address these limitations, we propose Multilingual Generative Retrieval via Cross-lingual Semantic Compression (MGR-CSC), a novel framework that unifies semantically equivalent multilingual keywords into shared atoms to align semantics and compresses the identifier space, and we propose a dynamic multi-step constrained decoding strategy during retrieval. MGR-CSC improves cross-lingual alignment by assigning consistent identifiers and enhances decoding efficiency by reducing redundancy. Experiments demonstrate that MGR-CSC achieves outstanding retrieval accuracy, improving by 6.83% on mMarco100k and 4.77% on mNQ320k, while reducing document identifiers length by 74.51% and 78.2%, respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes