Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement
This addresses a critical issue for AI researchers and developers using LLMs, as it reveals a fundamental flaw in self-improvement methods that can degrade task performance, though the findings are incremental in building on prior work on LLM biases.
The paper tackles the problem of LLMs' self-bias in self-refinement, showing that this bias is prevalent across six models and tasks, and that it amplifies during refinement, but can be mitigated with larger models and external feedback to improve performance.
Recent studies show that large language models (LLMs) improve their performance through self-feedback on certain tasks while degrade on others. We discovered that such a contrary is due to LLM's bias in evaluating their own output. In this paper, we formally define LLM's self-bias - the tendency to favor its own generation - using two statistics. We analyze six LLMs (GPT-4, GPT-3.5, Gemini, LLaMA2, Mixtral and DeepSeek) on translation, constrained text generation, and mathematical reasoning tasks. We find that self-bias is prevalent in all examined LLMs across multiple languages and tasks. Our analysis reveals that while the self-refine pipeline improves the fluency and understandability of model outputs, it further amplifies self-bias. To mitigate such biases, we discover that larger model size and external feedback with accurate assessment can significantly reduce bias in the self-refine pipeline, leading to actual performance improvement in downstream tasks. The code and data are released at https://github.com/xu1998hz/llm_self_bias.