AILGJun 16, 2025

Evaluating Generalization and Representation Stability in Small LMs via Prompting, Fine-Tuning and Out-of-Distribution Prompts

arXiv:2506.17289v22 citationsh-index: 9
Originality Synthesis-oriented
AI Analysis

This work provides practical guidance for model selection in low-data regimes, addressing a problem for researchers and practitioners in NLP, though it is incremental as it compares existing methods.

The paper compares few-shot prompting and supervised fine-tuning for small language models, analyzing their generalization and representation stability across in-distribution and out-of-distribution settings, with findings that highlight differences in how models internalize knowledge.

We investigate the generalization capabilities of small language models under two popular adaptation paradigms: few-shot prompting and supervised fine-tuning. While prompting is often favored for its parameter efficiency and flexibility, it remains unclear how robust this approach is in low-resource settings and under distributional shifts. This paper presents a comparative study of prompting and fine-tuning across task formats, prompt styles, and model scales, with a focus on their behavior in both in-distribution and out-of-distribution (OOD) settings. Beyond accuracy, we analyze the internal representations learned by each approach to assess the stability and abstraction of task-specific features. Our findings highlight critical differences in how small models internalize and generalize knowledge under different adaptation strategies. This work offers practical guidance for model selection in low-data regimes and contributes empirical insight into the ongoing debate over prompting versus fine-tuning. Code for the experiments is available at the following

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes