AI PLOct 31, 2022

A Simple, Yet Effective Approach to Finding Biases in Code Generation

Spyridon Mouselinos, Mateusz Malinowski, Henryk Michalewski

arXiv:2211.00609v245.3224 citationsh-index: 24

Originality Incremental advance

AI Analysis

This work addresses biases in code generation for developers and users, but it is incremental as it builds on existing methods for bias analysis.

The paper tackled the problem of biases in code generation systems inherited from large language models, which reduce code quality, and demonstrated a framework for exposing and mitigating these biases through automated intervention and fine-tuning.

Recently, high-performing code generation systems based on large language models have surfaced. They are trained on massive corpora containing much more natural text than actual executable computer code. This work shows that current code generation systems exhibit undesired biases inherited from their large language model backbones, which can reduce the quality of the generated code under specific circumstances. To investigate the effect, we propose the "block of influence" concept, which enables a modular decomposition and analysis of the coding challenges. We introduce an automated intervention mechanism reminiscent of adversarial testing that exposes undesired biases through the failure modes of the models under test. Finally, we demonstrate how our framework can be used as a data transformation technique during fine-tuning, acting as a mitigation strategy for these biases.

View on arXiv PDF

Similar