Annotator Positionality as Signal: Psychometric Weighting for Anti-Autistic Ableism Detection
For autistic communities and NLP practitioners, this work highlights critical flaws in LLM evaluation of ableist language, but the contribution is incremental as it extends existing bias detection methods with a novel weighting scheme.
The paper introduces a bias-aware evaluation framework using psychometrically-weighted ground truth based on annotator positionality to detect anti-autistic ableist language. It finds that LLMs frequently produce harmful outputs, mislabel community-reclaimed language as ableist, and rely on surface-level keyword matching rather than context.
Large language models (LLMs) are increasingly used in decision-making tasks where they can amplify or suppress perspectives, raising concerns in high-stakes settings affecting autistic communities. While previous research has identified disability-related biases in LLMs, it remains unclear how they conceptualize ableism or detect it in text. We introduce a bias-aware evaluation framework targeting anti-autistic ableist language with a psychometrically-weighted, community-proximate ground truth anchored in annotator positionality. This framework constitutes a stricter standard than conventional majority-vote aggregation which significantly and consistently underweights autistic and autism-accepting perspectives. We find that LLMs frequently produce harmful outputs, mislabel community-reclaimed language as ableist, and express more negative attitudes toward autistic people when assessment instruments are masked. Our error analysis reveals that models rely on surface-level keyword matching rather than contextual factors such as speaker identity, and whether the language fosters in-group solidarity or inflicts out-group harm.