Queer People are People First: Deconstructing Sexual Identity Stereotypes in Large Language Models
This addresses bias in LLMs for the LGBTQIA+ community, though it is incremental as it builds on existing debiasing techniques.
The study measured bias in large language models (LLMs) against queer people using regard scores and demonstrated that a chain-of-thought prompting method with SHAP analysis increased regard, offering a debiasing approach.
Large Language Models (LLMs) are trained primarily on minimally processed web text, which exhibits the same wide range of social biases held by the humans who created that content. Consequently, text generated by LLMs can inadvertently perpetuate stereotypes towards marginalized groups, like the LGBTQIA+ community. In this paper, we perform a comparative study of how LLMs generate text describing people with different sexual identities. Analyzing bias in the text generated by an LLM using regard score shows measurable bias against queer people. We then show that a post-hoc method based on chain-of-thought prompting using SHAP analysis can increase the regard of the sentence, representing a promising approach towards debiasing the output of LLMs in this setting.