CLAISep 11, 2025

ViRanker: A BGE-M3 & Blockwise Parallel Transformer Cross-Encoder for Vietnamese Reranking

arXiv:2509.09131v1
Originality Incremental advance
AI Analysis

This work addresses the problem of improving retrieval systems for Vietnamese, an underrepresented language, though it is incremental as it builds on existing methods like BGE-M3 and Blockwise Parallel Transformer.

The paper tackles the lack of competitive reranking models for Vietnamese, a low-resource language, by introducing ViRanker, which achieves strong early-rank accuracy on the MMARCO-VI benchmark, surpassing multilingual baselines and competing closely with PhoRanker.

This paper presents ViRanker, a cross-encoder reranking model tailored to the Vietnamese language. Built on the BGE-M3 encoder and enhanced with the Blockwise Parallel Transformer, ViRanker addresses the lack of competitive rerankers for Vietnamese, a low-resource language with complex syntax and diacritics. The model was trained on an 8 GB curated corpus and fine-tuned with hybrid hard-negative sampling to strengthen robustness. Evaluated on the MMARCO-VI benchmark, ViRanker achieves strong early-rank accuracy, surpassing multilingual baselines and competing closely with PhoRanker. By releasing the model openly on Hugging Face, we aim to support reproducibility and encourage wider adoption in real-world retrieval systems. Beyond Vietnamese, this study illustrates how careful architectural adaptation and data curation can advance reranking in other underrepresented languages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes