CL AI LGMar 20, 2024

Reducing Large Language Model Bias with Emphasis on 'Restricted Industries': Automated Dataset Augmentation and Prejudice Quantification

arXiv:2403.13925v11.0h-index: 3

Originality Incremental advance

AI Analysis

This addresses bias issues for users in restricted industries, but it is incremental as it builds on existing debiasing approaches with new metrics and a focus on data-scarce contexts.

The paper tackles bias in large language models, particularly in 'restricted industries' with limited data, by proposing an automated dataset augmentation method and introducing two new metrics (mb-index and db-index) to quantify bias from model architecture and datasets.

Despite the growing capabilities of large language models, there exists concerns about the biases they develop. In this paper, we propose a novel, automated mechanism for debiasing through specified dataset augmentation in the lens of bias producers and in the context of 'restricted industries' with limited data. We additionally create two new additional metrics, the mb-index and db-index, to quantify bias, considering the idea that bias occurs due to both intrinsic model architecture and dataset.

View on arXiv PDF

Similar