CL AI CY LGJan 1, 2023

CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation

Ge Zhang, Yizhi Li, Yaoyao Wu, Linyuan Zhang, Chenghua Lin, Jiayi Geng, Shi Wang, Jie Fu

MILA

arXiv:2301.00395v16.121 citationsh-index: 28Has Code

Originality Synthesis-oriented

AI Analysis

This addresses data inadequacy for gender bias mitigation in Chinese, an under-resourced language, though it is incremental as it extends existing work to a new linguistic context.

The authors tackled the problem of gender bias in Chinese natural language processing by creating CORGI-PM, a corpus with 32.9k sentences labeled for gender bias, and established baselines using state-of-the-art language models.

As natural language processing (NLP) for gender bias becomes a significant interdisciplinary topic, the prevalent data-driven techniques such as large-scale language models suffer from data inadequacy and biased corpus, especially for languages with insufficient resources such as Chinese. To this end, we propose a Chinese cOrpus foR Gender bIas Probing and Mitigation CORGI-PM, which contains 32.9k sentences with high-quality labels derived by following an annotation scheme specifically developed for gender bias in the Chinese context. Moreover, we address three challenges for automatic textual gender bias mitigation, which requires the models to detect, classify, and mitigate textual gender bias. We also conduct experiments with state-of-the-art language models to provide baselines. To our best knowledge, CORGI-PM is the first sentence-level Chinese corpus for gender bias probing and mitigation.

View on arXiv PDF Code

Similar