CL AI LGSep 24, 2024

Small Language Models: Survey, Measurements, and Insights

Zhenyan Lu, Xiang Li, Dongqi Cai, Rongjie Yi, Fangming Liu, Xiwen Zhang, Nicholas D. Lane, Mengwei Xu

Cambridge

arXiv:2409.15790v325.1147 citationsh-index: 17Has Code

Originality Synthesis-oriented

AI Analysis

It addresses the need for more accessible and efficient AI on smart devices by providing a comprehensive analysis of small language models, which is incremental as it synthesizes existing research.

The paper surveys 70 open-source small language models (100M-5B parameters) to analyze their technical innovations and evaluate their capabilities in domains like reasoning and mathematics, while benchmarking inference latency and memory footprints for on-device use.

Small language models (SLMs), despite their widespread adoption in modern smart devices, have received significantly less academic attention compared to their large language model (LLM) counterparts, which are predominantly deployed in data centers and cloud environments. While researchers continue to improve the capabilities of LLMs in the pursuit of artificial general intelligence, SLM research aims to make machine intelligence more accessible, affordable, and efficient for everyday tasks. Focusing on transformer-based, decoder-only language models with 100M-5B parameters, we survey 70 state-of-the-art open-source SLMs, analyzing their technical innovations across three axes: architectures, training datasets, and training algorithms. In addition, we evaluate their capabilities in various domains, including commonsense reasoning, mathematics, in-context learning, and long context. To gain further insight into their on-device runtime costs, we benchmark their inference latency and memory footprints. Through in-depth analysis of our benchmarking data, we offer valuable insights to advance research in this field.

View on arXiv PDF Code

Similar