CLApr 23, 2025

Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation

arXiv:2504.17025v112 citationsh-index: 30NAACL
Originality Incremental advance
AI Analysis

This addresses the problem of inefficient encoding and slower inference for Italian language processing, though it is incremental as it builds on existing vocabulary adaptation techniques.

The paper tackled the inefficiency of English LLMs for non-English languages by proposing Semantic Alignment Vocabulary Adaptation (SAVA) to optimize them for Italian, resulting in a 25% reduction in token fertility for Mistral-7b-v0.1 and a 1 billion parameter reduction for Llama-3.1-8B while maintaining competitive performance.

The number of pretrained Large Language Models (LLMs) is increasing steadily, though the majority are designed predominantly for the English language. While state-of-the-art LLMs can handle other languages, due to language contamination or some degree of multilingual pretraining data, they are not optimized for non-English languages, leading to inefficient encoding (high token "fertility") and slower inference speed. In this work, we thoroughly compare a variety of vocabulary adaptation techniques for optimizing English LLMs for the Italian language, and put forward Semantic Alignment Vocabulary Adaptation (SAVA), a novel method that leverages neural mapping for vocabulary substitution. SAVA achieves competitive performance across multiple downstream tasks, enhancing grounded alignment strategies. We adapt two LLMs: Mistral-7b-v0.1, reducing token fertility by 25\%, and Llama-3.1-8B, optimizing the vocabulary and reducing the number of parameters by 1 billion. We show that, following the adaptation of the vocabulary, these models can recover their performance with a relatively limited stage of continual training on the target language. Finally, we test the capabilities of the adapted models on various multi-choice and generative tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes