Whose Name Comes Up? III: Persona Prompting Effects in LLM-Based Scholar Recommendation
For researchers and practitioners using LLMs for academic discovery, this work reveals that prompt design is a non-trivial source of bias that should be systematically audited alongside model choice.
This paper audits 43 LLMs for scholar recommendation, finding that prompt design (language, location, role) significantly affects output diversity and factuality, with South Africa prompts yielding less factual lists and Japan prompts producing homogeneous lists. Technical quality is driven by model choice, while social representativeness depends on prompt context.
Large language models (LLMs) are increasingly used as scholar recommenders, shaping who is seen as an expert in academia. Existing audits remain English-centric, single discipline, and persona-agnostic, leaving the source of output variability poorly understood. To this end, we propose a benchmark that disentangles the effects of model choice and prompt design on recommendations. We audit 43 LLMs by varying persona prompts (language, location, role-and-task) and context (field, seniority, k). Recommended scholars are compared against Semantic Scholar over six scientific disciplines to measure technical quality (factuality, coverage) and social representativeness (diversity, parity). Basic technical quality is driven by model choice, factuality and parity by context, and diversity by location. South Africa prompts yield less factual lists, while Japan prompts yield highly factual but homogeneous lists skewed toward highly productive scholars. Prompt design is thus a non-trivial axis of LLM-based scholar discovery and should be systematically audited alongside model choice.