SE AI CLOct 24, 2025

Does Model Size Matter? A Comparison of Small and Large Language Models for Requirements Classification

Mohammad Amin Zadenoori, Vincenzo De Martino, Jacek Dabrowski, Xavier Franch, Alessio Ferrari

arXiv:2510.21443v18.02 citationsh-index: 4

Originality Synthesis-oriented

AI Analysis

This work addresses the trade-off between performance and resource use in requirements engineering for practitioners, offering an incremental comparison that supports small models as a viable alternative.

The study compared small and large language models for requirements classification, finding that small models achieve similar performance (within 2% F1 score, not statistically significant) despite being up to 300 times smaller, with dataset characteristics influencing results more than model size.

[Context and motivation] Large language models (LLMs) show notable results in natural language processing (NLP) tasks for requirements engineering (RE). However, their use is compromised by high computational cost, data sharing risks, and dependence on external services. In contrast, small language models (SLMs) offer a lightweight, locally deployable alternative. [Question/problem] It remains unclear how well SLMs perform compared to LLMs in RE tasks in terms of accuracy. [Results] Our preliminary study compares eight models, including three LLMs and five SLMs, on requirements classification tasks using the PROMISE, PROMISE Reclass, and SecReq datasets. Our results show that although LLMs achieve an average F1 score of 2% higher than SLMs, this difference is not statistically significant. SLMs almost reach LLMs performance across all datasets and even outperform them in recall on the PROMISE Reclass dataset, despite being up to 300 times smaller. We also found that dataset characteristics play a more significant role in performance than model size. [Contribution] Our study contributes with evidence that SLMs are a valid alternative to LLMs for requirements classification, offering advantages in privacy, cost, and local deployability.

View on arXiv PDF

Similar