CL AIFeb 23, 2025

Speed and Conversational Large Language Models: Not All Is About Tokens per Second

Javier Conde, Miguel González, Pedro Reviriego, Zhen Gao, Shanshan Liu, Fabrizio Lombardi

arXiv:2502.16721v16.78 citationsh-index: 13Computer

Originality Synthesis-oriented

AI Analysis

This research addresses the need for more nuanced speed benchmarks in LLMs for developers and researchers, though it is incremental as it focuses on comparative analysis without introducing new methods.

The study analyzed the speed of popular open-weights large language models on GPUs, finding that performance varies significantly depending on the specific task, not just token generation rates.

The speed of open-weights large language models (LLMs) and its dependency on the task at hand, when run on GPUs, is studied to present a comparative analysis of the speed of the most popular open LLMs.

View on arXiv PDF

Similar