CL LGDec 16, 2025

TiME: Tiny Monolingual Encoders for Efficient NLP Pipelines

David Schulmeister, Valentin Hartmann, Lars Klein, Robert West

arXiv:2512.14645v12.7h-index: 5

Originality Incremental advance

AI Analysis

This work addresses efficiency and sustainability problems for NLP practitioners deploying models in resource-constrained environments, such as real-time applications or battery-powered devices, though it is incremental in optimizing existing methods.

The paper tackles the inefficiency of large language models in NLP pipelines by introducing TiME, tiny monolingual encoders trained with distillation techniques, which achieve an improved trade-off between benchmark performance and metrics like throughput, latency, and energy consumption.

Today, a lot of research on language models is focused on large, general-purpose models. However, many NLP pipelines only require models with a well-defined, small set of capabilities. While large models are capable of performing the tasks of those smaller models, they are simply not fast enough to process large amounts of data or offer real-time responses. Furthermore, they often use unnecessarily large amounts of energy, leading to sustainability concerns and problems when deploying them on battery-powered devices. In our work, we show how to train small models for such efficiency-critical applications. As opposed to many off-the-shelf NLP pipelines, our models use modern training techniques such as distillation, and offer support for low-resource languages. We call our models TiME (Tiny Monolingual Encoders) and comprehensively evaluate them on a range of common NLP tasks, observing an improved trade-off between benchmark performance on one hand, and throughput, latency and energy consumption on the other. Along the way, we show that distilling monolingual models from multilingual teachers is possible, and likewise distilling models with absolute positional embeddings from teachers with relative positional embeddings.

View on arXiv PDF

Similar