CLAICYLGJan 1, 2023

CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation

MILA
arXiv:2301.00395v120 citationsh-index: 28
Originality Synthesis-oriented
AI Analysis

This addresses data inadequacy for gender bias mitigation in Chinese, an under-resourced language, though it is incremental as it extends existing work to a new linguistic context.

The authors tackled the problem of gender bias in Chinese natural language processing by creating CORGI-PM, a corpus with 32.9k sentences labeled for gender bias, and established baselines using state-of-the-art language models.

As natural language processing (NLP) for gender bias becomes a significant interdisciplinary topic, the prevalent data-driven techniques such as large-scale language models suffer from data inadequacy and biased corpus, especially for languages with insufficient resources such as Chinese. To this end, we propose a Chinese cOrpus foR Gender bIas Probing and Mitigation CORGI-PM, which contains 32.9k sentences with high-quality labels derived by following an annotation scheme specifically developed for gender bias in the Chinese context. Moreover, we address three challenges for automatic textual gender bias mitigation, which requires the models to detect, classify, and mitigate textual gender bias. We also conduct experiments with state-of-the-art language models to provide baselines. To our best knowledge, CORGI-PM is the first sentence-level Chinese corpus for gender bias probing and mitigation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes