CL AISep 21, 2025

Uncovering Implicit Bias in Large Language Models with Concept Learning Dataset

arXiv:2510.01219v12.7

Originality Synthesis-oriented

AI Analysis

This work addresses the issue of hidden biases in AI systems for researchers and developers, though it is incremental as it builds on existing concept learning methods.

The authors tackled the problem of uncovering implicit biases in large language models by introducing a concept learning dataset, finding that models exhibit a bias toward upward monotonicity in quantifiers, which is less detectable with direct prompting.

We introduce a dataset of concept learning tasks that helps uncover implicit biases in large language models. Using in-context concept learning experiments, we found that language models may have a bias toward upward monotonicity in quantifiers; such bias is less apparent when the model is tested by direct prompting without concept learning components. This demonstrates that in-context concept learning can be an effective way to discover hidden biases in language models.

View on arXiv PDF

Similar