CLAILGAug 21, 2025

VocabTailor: Dynamic Vocabulary Selection for Downstream Tasks in Small Language Models

arXiv:2508.15229v12 citationsh-index: 26
Originality Highly original
AI Analysis

This addresses memory constraints for deploying SLMs on resource-constrained edge devices, offering a novel solution beyond incremental static pruning.

The paper tackles the memory bottleneck in Small Language Models (SLMs) on edge devices by introducing VocabTailor, a dynamic vocabulary selection framework that reduces memory usage of vocabulary-related components by up to 99% with minimal performance degradation.

Small Language Models (SLMs) provide computational advantages in resource-constrained environments, yet memory limitations remain a critical bottleneck for edge device deployment. A substantial portion of SLMs' memory footprint stems from vocabulary-related components, particularly embeddings and language modeling (LM) heads, due to large vocabulary sizes. Existing static vocabulary pruning, while reducing memory usage, suffers from rigid, one-size-fits-all designs that cause information loss from the prefill stage and a lack of flexibility. In this work, we identify two key principles underlying the vocabulary reduction challenge: the lexical locality principle, the observation that only a small subset of tokens is required during any single inference, and the asymmetry in computational characteristics between vocabulary-related components of SLM. Based on these insights, we introduce VocabTailor, a novel decoupled dynamic vocabulary selection framework that addresses memory constraints through offloading embedding and implements a hybrid static-dynamic vocabulary selection strategy for LM Head, enabling on-demand loading of vocabulary components. Comprehensive experiments across diverse downstream tasks demonstrate that VocabTailor achieves a reduction of up to 99% in the memory usage of vocabulary-related components with minimal or no degradation in task performance, substantially outperforming existing static vocabulary pruning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes