CLSep 30, 2025

Vocabulary Customization for Efficient Domain-Specific LLM Deployment

Christian Herold, Michael Kozielski, Nicholas Santavas, Yannick Versley, Shahram Khadivi

arXiv:2509.26124v11 citationsh-index: 22

Originality Incremental advance

AI Analysis

This addresses efficiency issues for users deploying LLMs in specialized domains like e-commerce, but it is incremental as it builds on existing tokenization methods.

The paper tackled the problem of vocabulary mismatch in LLMs for domain-specific text, which reduces processing speed, by augmenting the pretrained vocabulary with domain-specific tokens, resulting in up to 20% shorter input sequences and reduced inference latency while preserving predictive quality.

When using an LLM to process text outside the training domain(s), an often overlooked factor is vocabulary mismatch, where the general-domain tokenizer fails to capture frequent domain-specific terms, leading to higher token fertility and thus a decrease in processing speed due to suboptimal sub-word splits. We address this limitation by augmenting the pretrained vocabulary with a set of domain-specific tokens. To this end, we design an algorithm that extends an existing tokenizer while guaranteeing it never decreases tokenization efficiency: every input sequence is segmented into at most the same number of tokens as before. Evaluated on real-world e-Commerce use-cases, the augmented tokenizer significantly shortens input sequences by up to 20% and reduces inference latency on downstream tasks while preserving predictive quality. We further analyze secondary effects, such as the impact on forward pass speed and the rate at which the model adopts the newly introduced tokens, to illustrate the broader benefits of vocabulary adaptation.

View on arXiv PDF

Similar