CLJan 30

JobResQA: A Benchmark for LLM Machine Reading Comprehension on Multilingual Résumés and JDs

Casimiro Pio Carrino, Paula Estrella, Rabih Zbib, Carlos Escolano, José A. R. Fonollosa

arXiv:2601.23183v10.6h-index: 12Has Code

Originality Synthesis-oriented

AI Analysis

This provides a reproducible benchmark for advancing fair and reliable LLM-based HR systems, though it is incremental as it builds on existing MRC and multilingual evaluation methods.

The authors tackled the problem of evaluating LLMs' machine reading comprehension on HR-specific tasks by introducing JobResQA, a multilingual benchmark with 581 QA pairs across five languages, and found that baseline evaluations showed higher performance in English and Spanish but substantial degradation in other languages.

We introduce JobResQA, a multilingual Question Answering benchmark for evaluating Machine Reading Comprehension (MRC) capabilities of LLMs on HR-specific tasks involving résumés and job descriptions. The dataset comprises 581 QA pairs across 105 synthetic résumé-job description pairs in five languages (English, Spanish, Italian, German, and Chinese), with questions spanning three complexity levels from basic factual extraction to complex cross-document reasoning. We propose a data generation pipeline derived from real-world sources through de-identification and data synthesis to ensure both realism and privacy, while controlled demographic and professional attributes (implemented via placeholders) enable systematic bias and fairness studies. We also present a cost-effective, human-in-the-loop translation pipeline based on the TEaR methodology, incorporating MQM error annotations and selective post-editing to ensure an high-quality multi-way parallel benchmark. We provide a baseline evaluations across multiple open-weight LLM families using an LLM-as-judge approach revealing higher performances on English and Spanish but substantial degradation for other languages, highlighting critical gaps in multilingual MRC capabilities for HR applications. JobResQA provides a reproducible benchmark for advancing fair and reliable LLM-based HR systems. The benchmark is publicly available at: https://github.com/Avature/jobresqa-benchmark

View on arXiv PDF Code

Similar