Overview of the NLPCC 2025 Shared Task: Gender Bias Mitigation Challenge
It tackles gender bias mitigation in NLP, specifically for Chinese, which has fewer fairness resources, but is incremental as it builds on existing data-driven techniques.
The paper introduces CORGI-PM, a Chinese corpus with 32.9k sentences including 5.2k gender-biased ones and their bias-eliminated versions, to address gender bias in NLP, and presents results from a shared task on automating bias detection, classification, and mitigation.
As natural language processing for gender bias becomes a significant interdisciplinary topic, the prevalent data-driven techniques, such as pre-trained language models, suffer from biased corpus. This case becomes more obvious regarding those languages with less fairness-related computational linguistic resources, such as Chinese. To this end, we propose a Chinese cOrpus foR Gender bIas Probing and Mitigation (CORGI-PM), which contains 32.9k sentences with high-quality labels derived by following an annotation scheme specifically developed for gender bias in the Chinese context. It is worth noting that CORGI-PM contains 5.2k gender-biased sentences along with the corresponding bias-eliminated versions rewritten by human annotators. We pose three challenges as a shared task to automate the mitigation of textual gender bias, which requires the models to detect, classify, and mitigate textual gender bias. In the literature, we present the results and analysis for the teams participating this shared task in NLPCC 2025.