CLLGOct 14, 2025

Who's Asking? Evaluating LLM Robustness to Inquiry Personas in Factual Question Answering

arXiv:2510.12925v12 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses the issue of factual reliability in LLMs for real-world users, though it is incremental as it builds on prior robustness testing by focusing on human-centered cues.

The paper tackles the problem of LLM robustness to user-provided personal context in factual question answering, finding that inquiry personas can meaningfully alter QA accuracy and trigger failure modes like refusals and hallucinations.

Large Language Models (LLMs) should answer factual questions truthfully, grounded in objective knowledge, regardless of user context such as self-disclosed personal information, or system personalization. In this paper, we present the first systematic evaluation of LLM robustness to inquiry personas, i.e. user profiles that convey attributes like identity, expertise, or belief. While prior work has primarily focused on adversarial inputs or distractors for robustness testing, we evaluate plausible, human-centered inquiry persona cues that users disclose in real-world interactions. We find that such cues can meaningfully alter QA accuracy and trigger failure modes such as refusals, hallucinated limitations, and role confusion. These effects highlight how model sensitivity to user framing can compromise factual reliability, and position inquiry persona testing as an effective tool for robustness evaluation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes