CL AIDec 12, 2024

What Makes Cryptic Crosswords Challenging for LLMs?

Abdelrahman Sadallah, Daria Kotova, Ekaterina Kochmar

arXiv:2412.09012v211.920 citationsh-index: 7Has CodeCOLING

Originality Synthesis-oriented

AI Analysis

This addresses a specific challenge in NLP for researchers and developers interested in language understanding and puzzle-solving tasks, but it is incremental as it builds on prior knowledge of LLM limitations without introducing a new method.

The paper tackled the problem of why Large Language Models (LLMs) perform poorly on cryptic crosswords, establishing benchmark results for three models (Gemma2, LLaMA3, ChatGPT) that show their performance is significantly below human levels.

Cryptic crosswords are puzzles that rely on general knowledge and the solver's ability to manipulate language on different levels, dealing with various types of wordplay. Previous research suggests that solving such puzzles is challenging even for modern NLP models, including Large Language Models (LLMs). However, there is little to no research on the reasons for their poor performance on this task. In this paper, we establish the benchmark results for three popular LLMs: Gemma2, LLaMA3 and ChatGPT, showing that their performance on this task is still significantly below that of humans. We also investigate why these models struggle to achieve superior performance. We release our code and introduced datasets at https://github.com/bodasadallah/decrypting-crosswords.

View on arXiv PDF Code

Similar