CL AI LGOct 7, 2023

The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning

Tian Jin, Nolan Clement, Xin Dong, Vaishnavh Nagarajan, Michael Carbin, Jonathan Ragan-Kelley, Gintare Karolina Dziugaite

arXiv:2310.04680v12.55 citationsh-index: 32

Originality Incremental advance

AI Analysis

This research addresses the problem of understanding scaling effects on core LLM capabilities for AI researchers and practitioners, providing insights into trade-offs in model efficiency.

The study investigated how scaling the number of parameters in large language models affects fact recall and in-context learning, finding that reducing model size by over 30% significantly impairs fact recall while preserving in-context learning abilities even with reductions of 60-70%.

How does scaling the number of parameters in large language models (LLMs) affect their core capabilities? We study two natural scaling techniques -- weight pruning and simply training a smaller or larger model, which we refer to as dense scaling -- and their effects on two core capabilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in-context during inference. By curating a suite of tasks that help disentangle these two capabilities, we find a striking difference in how these two abilities evolve due to scaling. Reducing the model size by more than 30\% (via either scaling approach) significantly decreases the ability to recall facts seen in pre-training. Yet, a 60--70\% reduction largely preserves the various ways the model can process in-context information, ranging from retrieving answers from a long context to learning parameterized functions from in-context exemplars. The fact that both dense scaling and weight pruning exhibit this behavior suggests that scaling model size has an inherently disparate effect on fact recall and in-context learning.

View on arXiv PDF

Similar