CLMay 13

From Rosetta to Match-Up: A Paired Corpus of Linguistic Puzzles with Human and LLM Benchmarks

arXiv:2605.1340886.5
Predicted impact top 46% in CL · last 90 daysOriginality Synthesis-oriented
AI Analysis

This work provides a new benchmark for evaluating linguistic reasoning in humans and machines, but the contribution is incremental as it focuses on converting existing puzzles rather than introducing a fundamentally new task.

The authors created a paired dataset of Rosetta Stone and Match-Up linguistic puzzles by converting existing puzzles, and found that both human experts and LLMs exhibit an all-or-nothing solving pattern on Match-Up puzzles.

In this paper, we examine linguistic puzzles used in high school linguistics competitions, focusing on two common formats: Rosetta Stone and Match-Up. We propose a systematic procedure for converting existing Rosetta Stone puzzles into corresponding Match-Up counterparts. Because linguistic puzzle creation is complex and time-consuming, our method provides an efficient way to accelerate the generation of new puzzles. We evaluate the resulting Rosetta Stone-Match-Up pairs with both human participants and large language models (LLMs). Our results show that both expert human solvers and LLMs display an all-or-nothing pattern on Match-Up puzzles, either solving them completely or failing entirely. This work contributes a new dataset of paired puzzles and provides a detailed evaluation of puzzle difficulty across formats, offering insights into both human and machine linguistic reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes