CLMar 6, 2025

Biases in Large Language Model-Elicited Text: A Case Study in Natural Language Inference

arXiv:2503.05047v119 citationsh-index: 2COLING

Originality Incremental advance

AI Analysis

This work addresses the problem of biases in LLM-generated datasets for NLP researchers and practitioners, showing that LLMs inherit similar issues as human-annotated data, which is incremental but important for dataset creation.

The study tested whether NLP datasets created with Large Language Models (LLMs) contain annotation artifacts and social biases similar to crowd-sourced datasets, using GPT-4, Llama-2 70b, and Mistral 7b to recreate a portion of the Stanford Natural Language Inference corpus. Fine-tuned BERT hypothesis-only classifiers achieved 86-96% accuracy on LLM-generated datasets, and analyses characterized the specific annotation artifacts and stereotypical biases present.

We test whether NLP datasets created with Large Language Models (LLMs) contain annotation artifacts and social biases like NLP datasets elicited from crowd-source workers. We recreate a portion of the Stanford Natural Language Inference corpus using GPT-4, Llama-2 70b for Chat, and Mistral 7b Instruct. We train hypothesis-only classifiers to determine whether LLM-elicited NLI datasets contain annotation artifacts. Next, we use pointwise mutual information to identify the words in each dataset that are associated with gender, race, and age-related terms. On our LLM-generated NLI datasets, fine-tuned BERT hypothesis-only classifiers achieve between 86-96% accuracy. Our analyses further characterize the annotation artifacts and stereotypical biases in LLM-generated datasets.

View on arXiv PDF

Similar