CLMay 24, 2023

Uncovering and Quantifying Social Biases in Code Generation

Yan Liu, Xiaokang Chen, Yan Gao, Zhe Su, Fengji Zhang, Daoguang Zan, Jian-Guang Lou, Pin-Yu Chen, Tsung-Yi Ho

arXiv:2305.15377v17.136 citations

Originality Highly original

AI Analysis

This addresses the potential hazards of automatic code generation tools for developers and users by uncovering and quantifying social biases, which is an incremental step in bias detection.

The paper tackled the problem of social biases in pre-trained code generation models by proposing a new paradigm to construct code prompts and developing a dataset with three metrics to quantify biases. Experimental results on three models (Codex, InCoder, CodeGen) revealed severe social biases, providing insights for selecting models with lower bias.

With the popularity of automatic code generation tools, such as Copilot, the study of the potential hazards of these tools is gaining importance. In this work, we explore the social bias problem in pre-trained code generation models. We propose a new paradigm to construct code prompts and successfully uncover social biases in code generation models. To quantify the severity of social biases in generated code, we develop a dataset along with three metrics to evaluate the overall social bias and fine-grained unfairness across different demographics. Experimental results on three pre-trained code generation models (Codex, InCoder, and CodeGen) with varying sizes, reveal severe social biases. Moreover, we conduct analysis to provide useful insights for further choice of code generation models with low social bias. (This work contains examples that potentially implicate stereotypes, associations, and other harms that could be offensive to individuals in certain social groups.)

View on arXiv PDF

Similar