CLAISep 10, 2024

Medal Matters: Probing LLMs' Failure Cases Through Olympic Rankings

arXiv:2409.06518v31 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of interpreting LLMs' knowledge integration for researchers, though it is incremental as it builds on existing probing methods.

The study tackled the problem of understanding LLMs' internal knowledge structures by evaluating them on historical Olympic medal tallies, finding that while they excel at retrieving medal counts, they struggle with providing rankings, highlighting a key difference from human reasoning.

Large language models (LLMs) have achieved remarkable success in natural language processing tasks, yet their internal knowledge structures remain poorly understood. This study examines these structures through the lens of historical Olympic medal tallies, evaluating LLMs on two tasks: (1) retrieving medal counts for specific teams and (2) identifying rankings of each team. While state-of-the-art LLMs excel in recalling medal counts, they struggle with providing rankings, highlighting a key difference between their knowledge organization and human reasoning. These findings shed light on the limitations of LLMs' internal knowledge integration and suggest directions for improvement. To facilitate further research, we release our code, dataset, and model outputs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes