Exploring How Fair Model Representations Relate to Fair Recommendations
This work addresses fairness in recommender systems for users, providing insights into evaluation methods, though it is incremental in refining existing fairness definitions.
The study challenged the assumption that fair model representations directly lead to fair recommendations, showing that optimizing for fair representations improves recommendation parity but representation-level evaluation is not a reliable proxy for this effect.
One of the many fairness definitions pursued in recent recommender system research targets mitigating demographic information encoded in model representations. Models optimized for this definition are typically evaluated on how well demographic attributes can be classified given model representations, with the (implicit) assumption that this measure accurately reflects \textit{recommendation parity}, i.e., how similar recommendations given to different users are. We challenge this assumption by comparing the amount of demographic information encoded in representations with various measures of how the recommendations differ. We propose two new approaches for measuring how well demographic information can be classified given ranked recommendations. Our results from extensive testing of multiple models on one real and multiple synthetically generated datasets indicate that optimizing for fair representations positively affects recommendation parity, but also that evaluation at the representation level is not a good proxy for measuring this effect when comparing models. We also provide extensive insight into how recommendation-level fairness metrics behave for various models by evaluating their performances on numerous generated datasets with different properties.