Task-Specific Efficiency Analysis: When Small Language Models Outperform Large Language Models
It provides quantitative foundations for deploying small models in resource-constrained production environments, addressing efficiency over marginal accuracy gains.
This paper tackles the problem of high computational costs of large language models by conducting a task-specific efficiency analysis, finding that small models (0.5-3B parameters) achieve superior Performance-Efficiency Ratio scores across five NLP tasks.
Large Language Models achieve remarkable performance but incur substantial computational costs unsuitable for resource-constrained deployments. This paper presents the first comprehensive task-specific efficiency analysis comparing 16 language models across five diverse NLP tasks. We introduce the Performance-Efficiency Ratio (PER), a novel metric integrating accuracy, throughput, memory, and latency through geometric mean normalization. Our systematic evaluation reveals that small models (0.5--3B parameters) achieve superior PER scores across all given tasks. These findings establish quantitative foundations for deploying small models in production environments prioritizing inference efficiency over marginal accuracy gains.