CLAIMar 10, 2025

MRCEval: A Comprehensive, Challenging and Accessible Machine Reading Comprehension Benchmark

arXiv:2503.07144v11 citationsh-index: 30Has Code
Originality Synthesis-oriented
AI Analysis

This provides a more thorough evaluation tool for researchers and developers assessing natural language understanding in LLMs, though it is incremental as it builds on existing MRC datasets.

The authors tackled the lack of a comprehensive machine reading comprehension (MRC) benchmark by introducing MRCEval, which includes 2.1K multi-choice questions covering 13 skills, and found that MRC remains challenging for 28 tested LLMs.

Machine Reading Comprehension (MRC) is an essential task in evaluating natural language understanding. Existing MRC datasets primarily assess specific aspects of reading comprehension (RC), lacking a comprehensive MRC benchmark. To fill this gap, we first introduce a novel taxonomy that categorizes the key capabilities required for RC. Based on this taxonomy, we construct MRCEval, an MRC benchmark that leverages advanced Large Language Models (LLMs) as both sample generators and selection judges. MRCEval is a comprehensive, challenging and accessible benchmark designed to assess the RC capabilities of LLMs thoroughly, covering 13 distinct RC skills with a total of 2.1K high-quality multi-choice questions. We perform an extensive evaluation of 28 widely used open-source and proprietary models, highlighting that MRC continues to present significant challenges even in the era of LLMs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes