CL IRApr 11, 2024

HLTCOE at TREC 2023 NeuCLIR Track

Eugene Yang, Dawn Lawrie, James Mayfield

arXiv:2404.08118v11.91 citationsh-index: 32TREC

Originality Synthesis-oriented

AI Analysis

This work addresses cross-language information retrieval for news and technical documents, but it appears incremental as it builds on existing methods like ColBERT and mT5.

The HLTCOE team tackled cross-language information retrieval by applying PLAID, mT5 reranking, and document translation to the TREC 2023 NeuCLIR track, achieving results through various training techniques like translate-train and multilingual translate-train, but no concrete numbers are provided in the abstract.

The HLTCOE team applied PLAID, an mT5 reranker, and document translation to the TREC 2023 NeuCLIR track. For PLAID we included a variety of models and training techniques -- the English model released with ColBERT v2, translate-train~(TT), Translate Distill~(TD) and multilingual translate-train~(MTT). TT trains a ColBERT model with English queries and passages automatically translated into the document language from the MS-MARCO v1 collection. This results in three cross-language models for the track, one per language. MTT creates a single model for all three document languages by combining the translations of MS-MARCO passages in all three languages into mixed-language batches. Thus the model learns about matching queries to passages simultaneously in all languages. Distillation uses scores from the mT5 model over non-English translated document pairs to learn how to score query-document pairs. The team submitted runs to all NeuCLIR tasks: the CLIR and MLIR news task as well as the technical documents task.

View on arXiv PDF

Similar