Exploring Robustness of LLMs to Paraphrasing Based on Sociodemographic Factors
This work addresses the problem of LLM robustness to global linguistic variations for researchers and practitioners in AI, though it is incremental as it extends existing datasets and methods.
The study investigated how large language models (LLMs) perform when faced with paraphrases based on sociodemographic factors like age and gender, finding that such demographic-based paraphrasing significantly impacts model performance, indicating challenges with linguistic variation.
Despite their linguistic prowess, LLMs have been shown to be vulnerable to small input perturbations. While robustness to local adversarial changes has been studied, robustness to global modifications such as different linguistic styles remains underexplored. Therefore, we take a broader approach to explore a wider range of variations across sociodemographic dimensions. We extend the SocialIQA dataset to create diverse paraphrased sets conditioned on sociodemographic factors (age and gender). The assessment aims to provide a deeper understanding of LLMs in (a) their capability of generating demographic paraphrases with engineered prompts and (b) their capabilities in interpreting real-world, complex language scenarios. We also perform a reliability analysis of the generated paraphrases looking into linguistic diversity and perplexity as well as manual evaluation. We find that demographic-based paraphrasing significantly impacts the performance of language models, indicating that the subtleties of linguistic variation remain a significant challenge. We will make the code and dataset available for future research.