CLMar 29

ProText: A benchmark dataset for measuring (mis)gendering in long-form texts

Hadas Kotek, Margit Bowler, Patrick Sonnenberg, Yu'an Yang

arXiv:2603.2783877.6h-index: 13

Predicted impact top 48% in CL · last 90 daysOriginality Incremental advance

AI Analysis

This dataset provides a tool for evaluating gender bias in text transformations like summarization, addressing gaps beyond traditional pronoun resolution and the gender binary.

ProText is a benchmark dataset for measuring (mis)gendering in long-form texts across diverse themes and pronouns. A case study with two prompts and two LLMs revealed systematic gender bias, especially when inputs lack explicit gender cues or models default to heteronormative assumptions.

We introduce ProText, a dataset for measuring gendering and misgendering in stylistically diverse long-form English texts. ProText spans three dimensions: Theme nouns (names, occupations, titles, kinship terms), Theme category (stereotypically male, stereotypically female, gender-neutral/non-gendered), and Pronoun category (masculine, feminine, gender-neutral, none). The dataset is designed to probe (mis)gendering in text transformations such as summarization and rewrites using state-of-the-art Large Language Models, extending beyond traditional pronoun resolution benchmarks and beyond the gender binary. We validated ProText through a mini case study, showing that even with just two prompts and two models, we can draw nuanced insights regarding gender bias, stereotyping, misgendering, and gendering. We reveal systematic gender bias, particularly when inputs contain no explicit gender cues or when models default to heteronormative assumptions.

View on arXiv PDF

Similar