LGCLFeb 2, 2024

Need a Small Specialized Language Model? Plan Early!

arXiv:2402.01093v25 citationsh-index: 39
AI Analysis

This addresses the need for efficient inference in resource-constrained settings, offering incremental improvements for domain-specific applications.

The paper tackles the problem of creating efficient small language models for specialized domains by proposing two solutions: importance sampling for task-specific pretraining and projected networks for cheap adaptation of a single pretrained model, demonstrating empirical effectiveness across various domains and budgets.

Large language models are versatile tools but are not suitable for small inference budgets. Small models have more efficient inference, but their lower capacity means that their performance can be good only if one limits their scope to a specialized domain. This paper explores how to get good specialized small language models using a large, generic, pretraining set and a limited amount of specialized data. We consider two scenarios, depending on whether (i) one can afford pretraining a model for each specialization task, or (ii) one wants to cheaply adapt a single pretrained model for each task. In the first scenario, we propose an effective solution based on importance sampling: we resample the pretraining set to imitate the specialization data and train a small model on it. In the second scenario, we propose a novel architecture, projected networks (PN). PN is a large network whose parameters can be linearly projected into a small network for specialization. For both scenarios, we demonstrate the empirical effectiveness of our solutions across various domains, training set sizes, and training budgets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes