PerFairX: Is There a Balance Between Fairness and Personality in Large Language Model Recommendations?
This addresses the problem of balancing personalization and equity in AI recommendations for developers and users, though it is incremental as it builds on existing LLM and fairness research.
The study tackled the tension between personality-based personalization and demographic fairness in LLM-based recommender systems, finding that personality-aware prompting improves alignment with individual traits but exacerbates fairness disparities across groups, with DeepSeek achieving stronger psychological fit but higher sensitivity to prompts.
The integration of Large Language Models (LLMs) into recommender systems has enabled zero-shot, personality-based personalization through prompt-based interactions, offering a new paradigm for user-centric recommendations. However, incorporating user personality traits via the OCEAN model highlights a critical tension between achieving psychological alignment and ensuring demographic fairness. To address this, we propose PerFairX, a unified evaluation framework designed to quantify the trade-offs between personalization and demographic equity in LLM-generated recommendations. Using neutral and personality-sensitive prompts across diverse user profiles, we benchmark two state-of-the-art LLMs, ChatGPT and DeepSeek, on movie (MovieLens 10M) and music (Last.fm 360K) datasets. Our results reveal that personality-aware prompting significantly improves alignment with individual traits but can exacerbate fairness disparities across demographic groups. Specifically, DeepSeek achieves stronger psychological fit but exhibits higher sensitivity to prompt variations, while ChatGPT delivers stable yet less personalized outputs. PerFairX provides a principled benchmark to guide the development of LLM-based recommender systems that are both equitable and psychologically informed, contributing to the creation of inclusive, user-centric AI applications in continual learning contexts.