How Frontier LLMs Adapt to Neurodivergence Context: A Measurement Framework for Surface vs. Structural Change in System-Prompted Responses

arXiv:2605.0011345.4

Predicted impact top 95% in CL · last 90 daysOriginality Incremental advance

AI Analysis

For researchers and practitioners auditing LLM fairness and adaptability, this work provides a reproducible framework to measure surface vs. structural changes in system-prompted responses to neurodivergence.

The paper introduces NDBench, a benchmark to evaluate how frontier LLMs adapt to neurodivergence context in system prompts. Key findings show that explicit instructions lead to lengthier, more structured outputs (p < 10^-8), but persona assertion alone fails to reduce harmful tendencies, with masking-reinforcement reduction only in instructed cases (36-44%).

We examine if frontier chat-based large language models (LLMs) adjust their outputs based on neurodivergence (ND) context in system prompts and describe the nature of these adjustments. Specifically, we propose NDBench, a 576-output benchmark involving two frontier models, three system prompt types (baseline, ND-profile assertion, and ND-profile assertion with explicit instructions for adjustments), four canonical ND profiles, and 24 prompts across four categories, one of which involves an adversarial masking strategy. Four trends emerge consistently from our findings. First, LLMs show significant adaptation under ND context, where fully instructed conditions yield lengthier and more structured outputs, characterized by higher token counts, more headings, and more granular steps (p < 10^-8, Holm-corrected). Second, such adaptation is largely structural in nature: although list density does not change much, there is a marked rise in the frequency of headings and per-step detail. Third, ND persona assertion alone fails to suppress potentially harmful tendencies, as masking-reinforcement decreases only in explicitly instructed cases (36-44% reduction); the reduction rate barely changes in persona assertion conditions. Moreover, reliability analysis of LLM-based harm assessment reveals that only two out of the six dimensions (masking and reinforcement, validation quality) exceed the pre-defined inter-judge agreement criterion (alpha >= 0.67) and thus can be considered primary results. NDBench is made publicly available along with its prompts, outputs, code, and other resources, forming a reproducible framework for auditing future LLMs' adaptation to ND awareness.

View on arXiv PDF

Similar