IRLGAug 28, 2025

Revealing Potential Biases in LLM-Based Recommender Systems in the Cold Start Setting

arXiv:2508.20401v23 citationsh-index: 1Has Code
Originality Incremental advance
AI Analysis

This addresses fairness concerns for users in cold-start scenarios where limited data is available, but it is incremental as it builds on existing bias evaluation methods.

The paper tackled the problem of fairness in LLM-based recommender systems in cold-start settings, revealing consistent biases such as gendered and cultural stereotypes across domains like music, movies, and colleges, with a non-linear relationship between model size and fairness.

Large Language Models (LLMs) are increasingly used for recommendation tasks due to their general-purpose capabilities. While LLMs perform well in rich-context settings, their behavior in cold-start scenarios, where only limited signals such as age, gender, or language are available, raises fairness concerns because they may rely on societal biases encoded during pretraining. We introduce a benchmark specifically designed to evaluate fairness in zero-context recommendation. Our modular pipeline supports configurable recommendation domains and sensitive attributes, enabling systematic and flexible audits of any open-source LLM. Through evaluations of state-of-the-art models (Gemma 3 and Llama 3.2), we uncover consistent biases across recommendation domains (music, movies, and colleges) including gendered and cultural stereotypes. We also reveal a non-linear relationship between model size and fairness, highlighting the need for nuanced analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes