CLFeb 27, 2024

Tower: An Open Multilingual Large Language Model for Translation-Related Tasks

Duarte M. Alves, José Pombal, Nuno M. Guerreiro, Pedro H. Martins, João Alves, Amin Farajian, Ben Peters, Ricardo Rei, Patrick Fernandes, Sweta Agrawal, Pierre Colombo, José G. C. de Souza

arXiv:2402.17733v132.3263 citationsh-index: 22Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for specialized models in translation workflows, offering an incremental improvement over existing open approaches.

The authors tackled the problem of adapting open large language models to multiple translation-related tasks, resulting in a model that surpasses open alternatives and is competitive with closed general-purpose models.

While general-purpose large language models (LLMs) demonstrate proficiency on multiple tasks within the domain of translation, approaches based on open LLMs are competitive only when specializing on a single task. In this paper, we propose a recipe for tailoring LLMs to multiple tasks present in translation workflows. We perform continued pretraining on a multilingual mixture of monolingual and parallel data, creating TowerBase, followed by finetuning on instructions relevant for translation processes, creating TowerInstruct. Our final model surpasses open alternatives on several tasks relevant to translation workflows and is competitive with general-purpose closed LLMs. To facilitate future research, we release the Tower models, our specialization dataset, an evaluation framework for LLMs focusing on the translation ecosystem, and a collection of model generations, including ours, on our benchmark.

View on arXiv PDF Code

Similar