CLJun 13, 2024

Investigating the translation capabilities of Large Language Models trained on parallel data only

arXiv:2406.09140v13 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of translation capabilities for NLP practitioners, but it is incremental as it builds on existing LLM methods with a specific focus on parallel data.

The researchers tackled the problem of training large language models solely on parallel data for machine translation, introducing PLUME models that perform comparably to previous architectures on 16 supervised and 56 zero-shot translation directions.

In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methods predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce PLUME (Parallel Language Model), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparably to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones. Utilizing this set of models, we conduct a thorough investigation into the translation capabilities of LLMs, probing their performance, the impact of the different elements of the prompt, and their cross-lingual representation space.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes