CLAIIRLGJan 12, 2024

Mapping Transformer Leveraged Embeddings for Cross-Lingual Document Representation

arXiv:2401.06583v11 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses the limitation of recommendation systems in handling documents in languages different from the query language, which is an incremental improvement for users seeking multilingual content.

This research tackled the problem of cross-lingual document recommendation by mapping Transformer Leveraged Document Representations (TLDRs) to a cross-lingual domain using four multilingual pre-trained transformer models and three mapping methods across 20 language pairs, resulting in improved effectiveness as measured by metrics like Mate Retrieval Rate and Reciprocal Rank.

Recommendation systems, for documents, have become tools to find relevant content on the Web. However, these systems have limitations when it comes to recommending documents in languages different from the query language, which means they might overlook resources in non-native languages. This research focuses on representing documents across languages by using Transformer Leveraged Document Representations (TLDRs) that are mapped to a cross-lingual domain. Four multilingual pre-trained transformer models (mBERT, mT5 XLM RoBERTa, ErnieM) were evaluated using three mapping methods across 20 language pairs representing combinations of five selected languages of the European Union. Metrics like Mate Retrieval Rate and Reciprocal Rank were used to measure the effectiveness of mapped TLDRs compared to non-mapped ones. The results highlight the power of cross-lingual representations achieved through pre-trained transformers and mapping approaches suggesting a promising direction for expanding beyond language connections, between two specific languages.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes