AI CLSep 21, 2023

Benchmarking quantized LLaMa-based models on the Brazilian Secondary School Exam

Matheus L. O. Santos, Cláudio E. C. Campelo

arXiv:2309.12071v11 citationsh-index: 1

Originality Synthesis-oriented

AI Analysis

This work addresses the accessibility of LLMs for users without dedicated hardware by benchmarking performance on a specific educational task, though it is incremental as it applies existing quantization methods to new data.

The study evaluated quantized LLaMA-based models (Alpaca, Koala, Vicuna) on a Brazilian exam dataset, finding best accuracies of 46% on Portuguese and 49% on English questions, with processing times of 20-50 seconds on home hardware.

Although Large Language Models (LLMs) represent a revolution in the way we interact with computers, allowing the construction of complex questions and the ability to reason over a sequence of statements, their use is restricted due to the need for dedicated hardware for execution. In this study, we evaluate the performance of LLMs based on the 7 and 13 billion LLaMA models, subjected to a quantization process and run on home hardware. The models considered were Alpaca, Koala, and Vicuna. To evaluate the effectiveness of these models, we developed a database containing 1,006 questions from the ENEM (Brazilian National Secondary School Exam). Our analysis revealed that the best performing models achieved an accuracy of approximately 46% for the original texts of the Portuguese questions and 49% on their English translations. In addition, we evaluated the computational efficiency of the models by measuring the time required for execution. On average, the 7 and 13 billion LLMs took approximately 20 and 50 seconds, respectively, to process the queries on a machine equipped with an AMD Ryzen 5 3600x processor

View on arXiv PDF

Similar