CLCYPFAug 21, 2025

SLM-Bench: A Comprehensive Benchmark of Small Language Models on Environmental Impacts--Extended Version

arXiv:2508.15478v24 citationsh-index: 18Has CodeEMNLP
Originality Synthesis-oriented
AI Analysis

This provides a standardized evaluation framework for researchers and practitioners working with resource-efficient language models, though it is incremental as it extends benchmarking practices to the specific domain of SLMs.

The authors tackled the lack of systematic evaluation of Small Language Models (SLMs) by introducing SLM-Bench, the first comprehensive benchmark assessing 15 SLMs across 9 NLP tasks using 23 datasets, which revealed diverse trade-offs between accuracy and energy efficiency.

Small Language Models (SLMs) offer computational efficiency and accessibility, yet a systematic evaluation of their performance and environmental impact remains lacking. We introduce SLM-Bench, the first benchmark specifically designed to assess SLMs across multiple dimensions, including accuracy, computational efficiency, and sustainability metrics. SLM-Bench evaluates 15 SLMs on 9 NLP tasks using 23 datasets spanning 14 domains. The evaluation is conducted on 4 hardware configurations, providing a rigorous comparison of their effectiveness. Unlike prior benchmarks, SLM-Bench quantifies 11 metrics across correctness, computation, and consumption, enabling a holistic assessment of efficiency trade-offs. Our evaluation considers controlled hardware conditions, ensuring fair comparisons across models. We develop an open-source benchmarking pipeline with standardized evaluation protocols to facilitate reproducibility and further research. Our findings highlight the diverse trade-offs among SLMs, where some models excel in accuracy while others achieve superior energy efficiency. SLM-Bench sets a new standard for SLM evaluation, bridging the gap between resource efficiency and real-world applicability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes