IR AISep 8, 2025

Datasets for Navigating Sensitive Topics in Recommendation Systems

Amelia Kovacs, Jerry Chee, Kimia Kazemian, Sarah Dean

arXiv:2509.07269v18.52 citationsh-index: 1WWW

Originality Synthesis-oriented

AI Analysis

This addresses the need for better evaluation tools in recommendation systems to mitigate harmful effects on user well-being, though it is incremental as it focuses on dataset creation.

The paper tackles the problem of evaluating personalized AI systems' exposure of users to sensitive content by introducing two novel datasets with sensitivity labels, enabling quantitative assessment beyond engagement metrics.

Personalized AI systems, from recommendation systems to chatbots, are a prevalent method for distributing content to users based on their learned preferences. However, there is growing concern about the adverse effects of these systems, including their potential tendency to expose users to sensitive or harmful material, negatively impacting overall well-being. To address this concern quantitatively, it is necessary to create datasets with relevant sensitivity labels for content, enabling researchers to evaluate personalized systems beyond mere engagement metrics. To this end, we introduce two novel datasets that include a taxonomy of sensitivity labels alongside user-content ratings: one that integrates MovieLens rating data with content warnings from the Does the Dog Die? community ratings website, and another that combines fan-fiction interaction data and user-generated warnings from Archive of Our Own.

View on arXiv PDF

Similar