CLAIDec 28, 2023

Evaluating the Performance of Large Language Models for Spanish Language in Undergraduate Admissions Exams

arXiv:2312.16845v12 citationsh-index: 5Comput Sist
Originality Synthesis-oriented
AI Analysis

This work assesses LLM performance for Spanish-language educational testing, providing benchmarks for admissions exams in Mexico, but it is incremental as it applies existing models to new data.

The study evaluated GPT-3.5 and BARD on Spanish-language undergraduate admissions exams in Mexico, finding that both models exceeded minimum acceptance scores by up to 75% in some programs, with GPT-3.5 marginally outperforming BARD at 60.94% vs. 60.42%.

This study evaluates the performance of large language models, specifically GPT-3.5 and BARD (supported by Gemini Pro model), in undergraduate admissions exams proposed by the National Polytechnic Institute in Mexico. The exams cover Engineering/Mathematical and Physical Sciences, Biological and Medical Sciences, and Social and Administrative Sciences. Both models demonstrated proficiency, exceeding the minimum acceptance scores for respective academic programs to up to 75% for some academic programs. GPT-3.5 outperformed BARD in Mathematics and Physics, while BARD performed better in History and questions related to factual information. Overall, GPT-3.5 marginally surpassed BARD with scores of 60.94% and 60.42%, respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes