Unmasking Gender Bias in Recommendation Systems and Enhancing Category-Aware Fairness
This work addresses fairness issues in recommendation systems, which impact users in areas like entertainment and job searches, by providing more nuanced evaluation and mitigation methods, though it builds incrementally on existing fairness-aware approaches.
The paper tackles gender bias in recommendation systems by introducing comprehensive metrics that evaluate fairness at a granular category level (e.g., movie genres), and shows that using these metrics as a regularization term during training significantly improves fairness without major performance loss, as demonstrated on three real-world datasets with five baseline and two fairness-aware models.
Recommendation systems are now an integral part of our daily lives. We rely on them for tasks such as discovering new movies, finding friends on social media, and connecting job seekers with relevant opportunities. Given their vital role, we must ensure these recommendations are free from societal stereotypes. Therefore, evaluating and addressing such biases in recommendation systems is crucial. Previous work evaluating the fairness of recommended items fails to capture certain nuances as they mainly focus on comparing performance metrics for different sensitive groups. In this paper, we introduce a set of comprehensive metrics for quantifying gender bias in recommendations. Specifically, we show the importance of evaluating fairness on a more granular level, which can be achieved using our metrics to capture gender bias using categories of recommended items like genres for movies. Furthermore, we show that employing a category-aware fairness metric as a regularization term along with the main recommendation loss during training can help effectively minimize bias in the models' output. We experiment on three real-world datasets, using five baseline models alongside two popular fairness-aware models, to show the effectiveness of our metrics in evaluating gender bias. Our metrics help provide an enhanced insight into bias in recommended items compared to previous metrics. Additionally, our results demonstrate how incorporating our regularization term significantly improves the fairness in recommendations for different categories without substantial degradation in overall recommendation performance.