FairPy: A Toolkit for Evaluation of Prediction Biases and their Mitigation in Large Language Models
This toolkit addresses bias evaluation and mitigation for users of large language models, but it is incremental as it primarily integrates existing methods.
The paper presents FairPy, a toolkit for evaluating and mitigating prediction biases in large language models like BERT and GPT-2, by providing a modular interface for integrating existing debiasing algorithms and making it publicly available as open-source.
Recent studies have demonstrated that large pretrained language models (LLMs) such as BERT and GPT-2 exhibit biases in token prediction, often inherited from the data distributions present in their training corpora. In response, a number of mathematical frameworks have been proposed to quantify, identify, and mitigate these the likelihood of biased token predictions. In this paper, we present a comprehensive survey of such techniques tailored towards widely used LLMs such as BERT, GPT-2, etc. We additionally introduce Fairpy, a modular and extensible toolkit that provides plug-and-play interfaces for integrating these mathematical tools, enabling users to evaluate both pretrained and custom language models. Fairpy supports the implementation of existing debiasing algorithms. The toolkit is open-source and publicly available at: \href{https://github.com/HrishikeshVish/Fairpy}{https://github.com/HrishikeshVish/Fairpy}