CLMay 24, 2023

Comparing Humans and Models on a Similar Scale: Towards Cognitive Gender Bias Evaluation in Coreference Resolution

arXiv:2305.15389v14 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the problem of understanding and comparing cognitive biases in AI models and humans for researchers in NLP and cognitive science, though it is incremental as it builds on existing bias evaluation methods.

The study quantified gender bias in coreference resolution by comparing models and humans using dual-process theory, finding that humans made about 3% more biased decisions on real-world data, while models were about 12% more biased on synthetic data.

Spurious correlations were found to be an important factor explaining model performance in various NLP tasks (e.g., gender or racial artifacts), often considered to be ''shortcuts'' to the actual task. However, humans tend to similarly make quick (and sometimes wrong) predictions based on societal and cognitive presuppositions. In this work we address the question: can we quantify the extent to which model biases reflect human behaviour? Answering this question will help shed light on model performance and provide meaningful comparisons against humans. We approach this question through the lens of the dual-process theory for human decision-making. This theory differentiates between an automatic unconscious (and sometimes biased) ''fast system'' and a ''slow system'', which when triggered may revisit earlier automatic reactions. We make several observations from two crowdsourcing experiments of gender bias in coreference resolution, using self-paced reading to study the ''fast'' system, and question answering to study the ''slow'' system under a constrained time setting. On real-world data humans make $\sim$3\% more gender-biased decisions compared to models, while on synthetic data models are $\sim$12\% more biased.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes