A Study on Accuracy, Miscalibration, and Popularity Bias in Recommendations
This work addresses inconsistency in recommendation systems for users and developers, but it is incremental as it builds on existing metrics without introducing new methods.
The study analyzed how miscalibration and popularity bias relate to recommendation accuracy across user groups with varying preferences for popular content, finding that users with little interest in popular content receive the worst accuracy, and identified that specific genres contribute to performance inconsistencies.
Recent research has suggested different metrics to measure the inconsistency of recommendation performance, including the accuracy difference between user groups, miscalibration, and popularity lift. However, a study that relates miscalibration and popularity lift to recommendation accuracy across different user groups is still missing. Additionally, it is unclear if particular genres contribute to the emergence of inconsistency in recommendation performance across user groups. In this paper, we present an analysis of these three aspects of five well-known recommendation algorithms for user groups that differ in their preference for popular content. Additionally, we study how different genres affect the inconsistency of recommendation performance, and how this is aligned with the popularity of the genres. Using data from LastFm, MovieLens, and MyAnimeList, we present two key findings. First, we find that users with little interest in popular content receive the worst recommendation accuracy, and that this is aligned with miscalibration and popularity lift. Second, our experiments show that particular genres contribute to a different extent to the inconsistency of recommendation performance, especially in terms of miscalibration in the case of the MyAnimeList dataset.