CLJan 22, 2025

Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation

arXiv:2501.12660v120 citationsh-index: 36COLING Workshops
Originality Incremental advance
AI Analysis

This work addresses efficiency and performance trade-offs for low-resource language applications, representing an incremental improvement in model optimization.

The paper tackled the problem of using Massively Multilingual Transformers in low-resource languages by applying knowledge distillation to create smaller, single-language models, achieving performance on-par with strong baselines in benchmark tasks with improved efficiency.

In this paper, we propose the use of simple knowledge distillation to produce smaller and more efficient single-language transformers from Massively Multilingual Transformers (MMTs) to alleviate tradeoffs associated with the use of such in low-resource settings. Using Tagalog as a case study, we show that these smaller single-language models perform on-par with strong baselines in a variety of benchmark tasks in a much more efficient manner. Furthermore, we investigate additional steps during the distillation process that improves the soft-supervision of the target language, and provide a number of analyses and ablations to show the efficacy of the proposed method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes