CLNov 15, 2025

Do LLMs and Humans Find the Same Questions Difficult? A Case Study on Japanese Quiz Answering

arXiv:2511.12300v1h-index: 8
Originality Synthesis-oriented
AI Analysis

This addresses the problem of understanding LLM-human difficulty alignment in quiz answering for NLP researchers, but it is incremental as it focuses on a specific domain.

The study investigated whether LLMs and humans find the same Japanese quiz questions difficult, finding that LLMs struggle more with questions not covered by Wikipedia and those requiring numerical answers.

LLMs have achieved performance that surpasses humans in many NLP tasks. However, it remains unclear whether problems that are difficult for humans are also difficult for LLMs. This study investigates how the difficulty of quizzes in a buzzer setting differs between LLMs and humans. Specifically, we first collect Japanese quiz data including questions, answers, and correct response rate of humans, then prompted LLMs to answer the quizzes under several settings, and compare their correct answer rate to that of humans from two analytical perspectives. The experimental results showed that, compared to humans, LLMs struggle more with quizzes whose correct answers are not covered by Wikipedia entries, and also have difficulty with questions that require numerical answers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes