Bamshad Mobasher

IR
h-index72
32papers
1,687citations
Novelty42%
AI Score44

32 Papers

LGJul 1, 2022
Behavioral Player Rating in Competitive Online Shooter Games

Arman Dehpanah, Muheeb Faizan Ghori, Jonathan Gemmell et al.

Competitive online games use rating systems for matchmaking; progression-based algorithms that estimate the skill level of players with interpretable ratings in terms of the outcome of the games they played. However, the overall experience of players is shaped by factors beyond the sole outcome of their games. In this paper, we engineer several features from in-game statistics to model players and create ratings that accurately represent their behavior and true performance level. We then compare the estimating power of our behavioral ratings against ratings created with three mainstream rating systems by predicting rank of players in four popular game modes from the competitive shooter genre. Our results show that the behavioral ratings present more accurate performance estimations while maintaining the interpretability of the created representations. Considering different aspects of the playing behavior of players and using behavioral ratings for matchmaking can lead to match-ups that are more aligned with players' goals and interests, consequently resulting in a more enjoyable gaming experience.

IROct 31, 2025
Effectiveness of LLMs in Temporal User Profiling for Recommendation

Milad Sabouri, Masoud Mansoury, Kun Lin et al.

Effectively modeling the dynamic nature of user preferences is crucial for enhancing recommendation accuracy and fostering transparency in recommender systems. Traditional user profiling often overlooks the distinction between transitory short-term interests and stable long-term preferences. This paper examines the capability of leveraging Large Language Models (LLMs) to capture these temporal dynamics, generating richer user representations through distinct short-term and long-term textual summaries of interaction histories. Our observations suggest that while LLMs tend to improve recommendation quality in domains with more active user engagement, their benefits appear less pronounced in sparser environments. This disparity likely stems from the varying distinguishability of short-term and long-term preferences across domains; the approach shows greater utility where these temporal interests are more clearly separable (e.g., Movies\&TV) compared to domains with more stable user profiles (e.g., Video Games). This highlights a critical trade-off between enhanced performance and computational costs, suggesting context-dependent LLM application. Beyond predictive capability, this LLM-driven approach inherently provides an intrinsic potential for interpretability through its natural language profiles and attention weights. This work contributes insights into the practical capability and inherent interpretability of LLM-driven temporal user profiling, outlining new research directions for developing adaptive and transparent recommender systems.

IRMay 1, 2025
Towards Explainable Temporal User Profiling with LLMs

Milad Sabouri, Masoud Mansoury, Kun Lin et al.

Accurately modeling user preferences is vital not only for improving recommendation performance but also for enhancing transparency in recommender systems. Conventional user profiling methods, such as averaging item embeddings, often overlook the evolving, nuanced nature of user interests, particularly the interplay between short-term and long-term preferences. In this work, we leverage large language models (LLMs) to generate natural language summaries of users' interaction histories, distinguishing recent behaviors from more persistent tendencies. Our framework not only models temporal user preferences but also produces natural language profiles that can be used to explain recommendations in an interpretable manner. These textual profiles are encoded via a pre-trained model, and an attention mechanism dynamically fuses the short-term and long-term embeddings into a comprehensive user representation. Beyond boosting recommendation accuracy over multiple baselines, our approach naturally supports explainability: the interpretable text summaries and attention weights can be exposed to end users, offering insights into why specific items are suggested. Experiments on real-world datasets underscore both the performance gains and the promise of generating clearer, more transparent justifications for content-based recommendations.

IRAug 11, 2025
Using LLMs to Capture Users' Temporal Context for Recommendation

Milad Sabouri, Masoud Mansoury, Kun Lin et al.

Effective recommender systems demand dynamic user understanding, especially in complex, evolving environments. Traditional user profiling often fails to capture the nuanced, temporal contextual factors of user preferences, such as transient short-term interests and enduring long-term tastes. This paper presents an assessment of Large Language Models (LLMs) for generating semantically rich, time-aware user profiles. We do not propose a novel end-to-end recommendation architecture; instead, the core contribution is a systematic investigation into the degree of LLM effectiveness in capturing the dynamics of user context by disentangling short-term and long-term preferences. This approach, framing temporal preferences as dynamic user contexts for recommendations, adaptively fuses these distinct contextual components into comprehensive user embeddings. The evaluation across Movies&TV and Video Games domains suggests that while LLM-generated profiles offer semantic depth and temporal structure, their effectiveness for context-aware recommendations is notably contingent on the richness of user interaction histories. Significant gains are observed in dense domains (e.g., Movies&TV), whereas improvements are less pronounced in sparse environments (e.g., Video Games). This work highlights LLMs' nuanced potential in enhancing user profiling for adaptive, context-aware recommendations, emphasizing the critical role of dataset characteristics for practical applicability.

IRAug 11, 2025
Temporal User Profiling with LLMs: Balancing Short-Term and Long-Term Preferences for Recommendations

Milad Sabouri, Masoud Mansoury, Kun Lin et al.

Accurately modeling user preferences is crucial for improving the performance of content-based recommender systems. Existing approaches often rely on simplistic user profiling methods, such as averaging or concatenating item embeddings, which fail to capture the nuanced nature of user preference dynamics, particularly the interactions between long-term and short-term preferences. In this work, we propose LLM-driven Temporal User Profiling (LLM-TUP), a novel method for user profiling that explicitly models short-term and long-term preferences by leveraging interaction timestamps and generating natural language representations of user histories using a large language model (LLM). These representations are encoded into high-dimensional embeddings using a pre-trained BERT model, and an attention mechanism is applied to dynamically fuse the short-term and long-term embeddings into a comprehensive user profile. Experimental results on real-world datasets demonstrate that LLM-TUP achieves substantial improvements over several baselines, underscoring the effectiveness of our temporally aware user-profiling approach and the use of semantically rich user profiles, generated by LLMs, for personalized content-based recommendation.

GTNov 29, 2021
Player Modeling using Behavioral Signals in Competitive Online Games

Arman Dehpanah, Muheeb Faizan Ghori, Jonathan Gemmell et al.

Competitive online games use rating systems to match players with similar skills to ensure a satisfying experience for players. In this paper, we focus on the importance of addressing different aspects of playing behavior when modeling players for creating match-ups. To this end, we engineer several behavioral features from a dataset of over 75,000 battle royale matches and create player models based on the retrieved features. We then use the created models to predict ranks for different groups of players in the data. The predicted ranks are compared to those of three popular rating systems. Our results show the superiority of simple behavioral models over mainstream rating systems. Some behavioral features provided accurate predictions for all groups of players while others proved useful for certain groups of players. The results of this study highlight the necessity of considering different aspects of the player's behavior such as goals, strategy, and expertise when making assignments.

IRSep 2, 2021
How does the User's Knowledge of the Recommender Influence their Behavior?

Muheeb Faizan Ghori, Arman Dehpanah, Jonathan Gemmell et al.

Recommender systems have become a ubiquitous part of modern web applications. They help users discover new and relevant items. Today's users, through years of interaction with these systems have developed an inherent understanding of how recommender systems function, what their objectives are, and how the user might manipulate them. We describe this understanding as the Theory of the Recommender. In this study, we conducted semi-structured interviews with forty recommender system users to empirically explore the relevant factors influencing user behavior. Our findings, based on a rigorous thematic analysis of the collected data, suggest that users possess an intuitive and sophisticated understanding of the recommender system's behavior. We also found that users, based upon their understanding, attitude, and intentions change their interactions to evoke desired recommender behavior. Finally, we discuss the potential implications of such user behavior on recommendation performance.

IRAug 7, 2021
Unbiased Cascade Bandits: Mitigating Exposure Bias in Online Learning to Rank Recommendation

Masoud Mansoury, Himan Abdollahpouri, Bamshad Mobasher et al.

Exposure bias is a well-known issue in recommender systems where items and suppliers are not equally represented in the recommendation results. This is especially problematic when bias is amplified over time as a few popular items are repeatedly over-represented in recommendation lists. This phenomenon can be viewed as a recommendation feedback loop: the system repeatedly recommends certain items at different time points and interactions of users with those items will amplify bias towards those items over time. This issue has been extensively studied in the literature on model-based or neighborhood-based recommendation algorithms, but less work has been done on online recommendation models such as those based on multi-armed Bandit algorithms. In this paper, we study exposure bias in a class of well-known bandit algorithms known as Linear Cascade Bandits. We analyze these algorithms on their ability to handle exposure bias and provide a fair representation for items and suppliers in the recommendation results. Our analysis reveals that these algorithms fail to treat items and suppliers fairly and do not sufficiently explore the item space for each user. To mitigate this bias, we propose a discounting factor and incorporate it into these algorithms that controls the exposure of items at each time step. To show the effectiveness of the proposed discounting factor on mitigating exposure bias, we perform experiments on two datasets using three cascading bandit algorithms and our experimental results show that the proposed method improves the exposure fairness for items and suppliers.

IRJul 7, 2021
A Graph-based Approach for Mitigating Multi-sided Exposure Bias in Recommender Systems

Masoud Mansoury, Himan Abdollahpouri, Mykola Pechenizkiy et al.

Fairness is a critical system-level objective in recommender systems that has been the subject of extensive recent research. A specific form of fairness is supplier exposure fairness where the objective is to ensure equitable coverage of items across all suppliers in recommendations provided to users. This is especially important in multistakeholder recommendation scenarios where it may be important to optimize utilities not just for the end-user, but also for other stakeholders such as item sellers or producers who desire a fair representation of their items. This type of supplier fairness is sometimes accomplished by attempting to increasing aggregate diversity in order to mitigate popularity bias and to improve the coverage of long-tail items in recommendations. In this paper, we introduce FairMatch, a general graph-based algorithm that works as a post processing approach after recommendation generation to improve exposure fairness for items and suppliers. The algorithm iteratively adds high quality items that have low visibility or items from suppliers with low exposure to the users' final recommendation lists. A comprehensive set of experiments on two datasets and comparison with state-of-the-art baselines show that FairMatch, while significantly improves exposure fairness and aggregate diversity, maintains an acceptable level of relevance of the recommendations.

AIJun 21, 2021
Evaluating Team Skill Aggregation in Online Competitive Games

Arman Dehpanah, Muheeb Faizan Ghori, Jonathan Gemmell et al.

One of the main goals of online competitive games is increasing player engagement by ensuring fair matches. These games use rating systems for creating balanced match-ups. Rating systems leverage statistical estimation to rate players' skills and use skill ratings to predict rank before matching players. Skill ratings of individual players can be aggregated to compute the skill level of a team. While research often aims to improve the accuracy of skill estimation and fairness of match-ups, less attention has been given to how the skill level of a team is calculated from the skill level of its members. In this paper, we propose two new aggregation methods and compare them with a standard approach extensively used in the research literature. We present an exhaustive analysis of the impact of these methods on the predictive performance of rating systems. We perform our experiments using three popular rating systems, Elo, Glicko, and TrueSkill, on three real-world datasets including over 100,000 battle royale and head-to-head matches. Our evaluations show the superiority of the MAX method over the other two methods in the majority of the tested cases, implying that the overall performance of a team is best determined by the performance of its most skilled member. The results of this study highlight the necessity of devising more elaborated methods for calculating a team's performance -- methods covering different aspects of players' behavior such as skills, strategy, or goals.

IRMay 28, 2021
The Evaluation of Rating Systems in Team-based Battle Royale Games

Arman Dehpanah, Muheeb Faizan Ghori, Jonathan Gemmell et al.

Online competitive games have become a mainstream entertainment platform. To create a fair and exciting experience, these games use rating systems to match players with similar skills. While there has been an increasing amount of research on improving the performance of these systems, less attention has been paid to how their performance is evaluated. In this paper, we explore the utility of several metrics for evaluating three popular rating systems on a real-world dataset of over 25,000 team battle royale matches. Our results suggest considerable differences in their evaluation patterns. Some metrics were highly impacted by the inclusion of new players. Many could not capture the real differences between certain groups of players. Among all metrics studied, normalized discounted cumulative gain (NDCG) demonstrated more reliable performance and more flexibility. It alleviated most of the challenges faced by the other metrics while adding the freedom to adjust the focus of the evaluations on different groups of players.

IRMar 11, 2021
Toward the Next Generation of News Recommender Systems

Himan Abdollahpouri, Edward Malthouse, Joseph Konstan et al.

This paper proposes a vision and research agenda for the next generation of news recommender systems (RS), called the table d'hote approach. A table d'hote (translates as host's table) meal is a sequence of courses that create a balanced and enjoyable dining experience for a guest. Likewise, we believe news RS should strive to create a similar experience for the users by satisfying the news-diet needs of a user. While extant news RS considers criteria such as diversity and serendipity, and RS bundles have been studied for other contexts such as tourism, table d'hote goes further by ensuring the recommended articles satisfy a diverse set of user needs in the right proportions and in a specific order. In table d'hote, available articles need to be stratified based on the different ways that news can create value for the reader, building from theories and empirical research in journalism and user engagement. Using theories and empirical research from communication on the uses and gratifications (U&G) consumers derive from media, we define two main strata in a table d'hote news RS, each with its own substrata: 1) surveillance, which consists of information the user needs to know, and 2) serendipity, which are the articles offering unexpected surprises. The diversity of the articles according to the defined strata and the order of the articles within the list of recommendations are also two important aspects of the table d'hote in order to give the users the most effective reading experience. We propose our vision, link it to the existing concepts in the RS literature, and identify challenges for future research.

IRMar 10, 2021
User-centered Evaluation of Popularity Bias in Recommender Systems

Himan Abdollahpouri, Masoud Mansoury, Robin Burke et al.

Recommendation and ranking systems are known to suffer from popularity bias; the tendency of the algorithm to favor a few popular items while under-representing the majority of other items. Prior research has examined various approaches for mitigating popularity bias and enhancing the recommendation of long-tail, less popular, items. The effectiveness of these approaches is often assessed using different metrics to evaluate the extent to which over-concentration on popular items is reduced. However, not much attention has been given to the user-centered evaluation of this bias; how different users with different levels of interest towards popular items are affected by such algorithms. In this paper, we show the limitations of the existing metrics to evaluate popularity bias mitigation when we want to assess these algorithms from the users' perspective and we propose a new metric that can address these limitations. In addition, we present an effective approach that mitigates popularity bias from the user-centered point of view. Finally, we investigate several state-of-the-art approaches proposed in recent years to mitigate popularity bias and evaluate their performances using the existing metrics and also from the users' perspective. Our experimental results using two publicly-available datasets show that existing popularity bias mitigation techniques ignore the users' tolerance towards popular items. Our proposed user-centered method can tackle popularity bias effectively for different users while also improving the existing metrics.

IRAug 21, 2020
The Connection Between Popularity Bias, Calibration, and Fairness in Recommendation

Himan Abdollahpouri, Masoud Mansoury, Robin Burke et al.

Recently there has been a growing interest in fairness-aware recommender systems including fairness in providing consistent performance across different users or groups of users. A recommender system could be considered unfair if the recommendations do not fairly represent the tastes of a certain group of users while other groups receive recommendations that are consistent with their preferences. In this paper, we use a metric called miscalibration for measuring how a recommendation algorithm is responsive to users' true preferences and we consider how various algorithms may result in different degrees of miscalibration for different users. In particular, we conjecture that popularity bias which is a well-known phenomenon in recommendation is one important factor leading to miscalibration in recommendation. Our experimental results using two real-world datasets show that there is a connection between how different user groups are affected by algorithmic popularity bias and their level of interest in popular items. Moreover, we show that the more a group is affected by the algorithmic popularity bias, the more their recommendations are miscalibrated.

IRAug 15, 2020
The Evaluation of Rating Systems in Online Free-for-All Games

Arman Dehpanah, Muheeb Faizan Ghori, Jonathan Gemmell et al.

Online competitive games have become increasingly popular. To ensure an exciting and competitive environment, these games routinely attempt to match players with similar skill levels. Matching players is often accomplished through a rating system. There has been an increasing amount of research on developing such rating systems. However, less attention has been given to the evaluation metrics of these systems. In this paper, we present an exhaustive analysis of six metrics for evaluating rating systems in online competitive games. We compare traditional metrics such as accuracy. We then introduce other metrics adapted from the field of information retrieval. We evaluate these metrics against several well-known rating systems on a large real-world dataset of over 100,000 free-for-all matches. Our results show stark differences in their utility. Some metrics do not consider deviations between two ranks. Others are inordinately impacted by new players. Many do not capture the importance of distinguishing between errors in higher ranks and lower ranks. Among all metrics studied, we recommend Normalized Discounted Cumulative Gain (NDCG) because not only does it resolve the issues faced by other metrics, but it also offers flexibility to adjust the evaluations based on the goals of the system

IRJul 25, 2020
Feedback Loop and Bias Amplification in Recommender Systems

Masoud Mansoury, Himan Abdollahpouri, Mykola Pechenizkiy et al.

Recommendation algorithms are known to suffer from popularity bias; a few popular items are recommended frequently while the majority of other items are ignored. These recommendations are then consumed by the users, their reaction will be logged and added to the system: what is generally known as a feedback loop. In this paper, we propose a method for simulating the users interaction with the recommenders in an offline setting and study the impact of feedback loop on the popularity bias amplification of several recommendation algorithms. We then show how this bias amplification leads to several other problems such as declining the aggregate diversity, shifting the representation of users' taste over time and also homogenization of the users experience. In particular, we show that the impact of feedback loop is generally stronger for the users who belong to the minority group.

IRJul 23, 2020
Addressing the Multistakeholder Impact of Popularity Bias in Recommendation Through Calibration

Himan Abdollahpouri, Masoud Mansoury, Robin Burke et al.

Popularity bias is a well-known phenomenon in recommender systems: popular items are recommended even more frequently than their popularity would warrant, amplifying long-tail effects already present in many recommendation domains. Prior research has examined various approaches for mitigating popularity bias and enhancing the recommendation of long-tail items overall. The effectiveness of these approaches, however, has not been assessed in multistakeholder environments where in addition to the users who receive the recommendations, the utility of the suppliers of the recommended items should also be considered. In this paper, we propose the concept of popularity calibration which measures the match between the popularity distribution of items in a user's profile and that of the recommended items. We also develop an algorithm that optimizes this metric. In addition, we demonstrate that existing evaluation metrics for popularity bias do not reflect the performance of the algorithms when it is measured from the perspective of different stakeholders. Using music and movie datasets, we empirically show that our approach outperforms the existing state-of-the-art approaches in addressing popularity bias by calibrating the recommendations to users' preferences. We also show that our proposed algorithm has a secondary effect of improving supplier fairness.

IRJun 5, 2020
Using Stable Matching to Optimize the Balance between Accuracy and Diversity in Recommendation

Farzad Eskandanian, Bamshad Mobasher

Increasing aggregate diversity (or catalog coverage) is an important system-level objective in many recommendation domains where it may be desirable to mitigate the popularity bias and to improve the coverage of long-tail items in recommendations given to users. This is especially important in multistakeholder recommendation scenarios where it may be important to optimize utilities not just for the end user, but also for other stakeholders such as item sellers or producers who desire a fair representation of their items across recommendation lists produced by the system. Unfortunately, attempts to increase aggregate diversity often result in lower recommendation accuracy for end users. Thus, addressing this problem requires an approach that can effectively manage the trade-offs between accuracy and aggregate diversity. In this work, we propose a two-sided post-processing approach in which both user and item utilities are considered. Our goal is to maximize aggregate diversity while minimizing loss in recommendation accuracy. Our solution is a generalization of the Deferred Acceptance algorithm which was proposed as an efficient algorithm to solve the well-known stable matching problem. We prove that our algorithm results in a unique user-optimal stable match between items and users. Using three recommendation datasets, we empirically demonstrate the effectiveness of our approach in comparison to several baselines. In particular, our results show that the proposed solution is quite effective in increasing aggregate diversity and item-side utility while optimizing recommendation accuracy for end users.

IRMay 21, 2020
Opportunistic Multi-aspect Fairness through Personalized Re-ranking

Nasim Sonboli, Farzad Eskandanian, Robin Burke et al.

As recommender systems have become more widespread and moved into areas with greater social impact, such as employment and housing, researchers have begun to seek ways to ensure fairness in the results that such systems produce. This work has primarily focused on developing recommendation approaches in which fairness metrics are jointly optimized along with recommendation accuracy. However, the previous work had largely ignored how individual preferences may limit the ability of an algorithm to produce fair recommendations. Furthermore, with few exceptions, researchers have only considered scenarios in which fairness is measured relative to a single sensitive feature or attribute (such as race or gender). In this paper, we present a re-ranking approach to fairness-aware recommendation that learns individual preferences across multiple fairness dimensions and uses them to enhance provider fairness in recommendation results. Specifically, we show that our opportunistic and metric-agnostic approach achieves a better trade-off between accuracy and fairness than prior re-ranking approaches and does so across multiple fairness dimensions.

IRMay 3, 2020
FairMatch: A Graph-based Approach for Improving Aggregate Diversity in Recommender Systems

Masoud Mansoury, Himan Abdollahpouri, Mykola Pechenizkiy et al.

Recommender systems are often biased toward popular items. In other words, few items are frequently recommended while the majority of items do not get proportionate attention. That leads to low coverage of items in recommendation lists across users (i.e. low aggregate diversity) and unfair distribution of recommended items. In this paper, we introduce FairMatch, a general graph-based algorithm that works as a post-processing approach after recommendation generation for improving aggregate diversity. The algorithm iteratively finds items that are rarely recommended yet are high-quality and add them to the users' final recommendation lists. This is done by solving the maximum flow problem on the recommendation bipartite graph. While we focus on aggregate diversity and fair distribution of recommended items, the algorithm can be adapted to other recommendation scenarios using different underlying definitions of fairness. A comprehensive set of experiments on two datasets and comparison with state-of-the-art baselines show that FairMatch, while significantly improving aggregate diversity, provides comparable recommendation accuracy.

IRFeb 18, 2020
Investigating Potential Factors Associated with Gender Discrimination in Collaborative Recommender Systems

Masoud Mansoury, Himan Abdollahpouri, Jessie Smith et al.

The proliferation of personalized recommendation technologies has raised concerns about discrepancies in their recommendation performance across different genders, age groups, and racial or ethnic populations. This varying degree of performance could impact users' trust in the system and may pose legal and ethical issues in domains where fairness and equity are critical concerns, like job recommendation. In this paper, we investigate several potential factors that could be associated with discriminatory performance of a recommendation algorithm for women versus men. We specifically study several characteristics of user profiles and analyze their possible associations with disparate behavior of the system towards different genders. These characteristics include the anomaly in rating behavior, the entropy of users' profiles, and the users' profile size. Our experimental results on a public dataset using four recommendation algorithms show that, based on all the three mentioned factors, women get less accurate recommendations than men indicating an unfair nature of recommendation algorithms across genders.

IROct 13, 2019
The Impact of Popularity Bias on Fairness and Calibration in Recommendation

Himan Abdollahpouri, Masoud Mansoury, Robin Burke et al.

Recently there has been a growing interest in fairness-aware recommender systems, including fairness in providing consistent performance across different users or groups of users. A recommender system could be considered unfair if the recommendations do not fairly represent the tastes of a certain group of users while other groups receive recommendations that are consistent with their preferences. In this paper, we use a metric called miscalibration for measuring how a recommendation algorithm is responsive to users' true preferences and we consider how various algorithms may result in different degrees of miscalibration. A well-known type of bias in recommendation is popularity bias where few popular items are over-represented in recommendations, while the majority of other items do not get significant exposure. We conjecture that popularity bias is one important factor leading to miscalibration in recommendation. Our experimental results using two real-world datasets show that there is a strong correlation between how different user groups are affected by algorithmic popularity bias and their level of interest in popular items. Moreover, we show algorithms with greater popularity bias amplification tend to have greater miscalibration.

IRSep 13, 2019
Crank up the volume: preference bias amplification in collaborative recommendation

Kun Lin, Nasim Sonboli, Bamshad Mobasher et al.

Recommender systems are personalized: we expect the results given to a particular user to reflect that user's preferences. Some researchers have studied the notion of calibration, how well recommendations match users' stated preferences, and bias disparity the extent to which mis-calibration affects different user groups. In this paper, we examine bias disparity over a range of different algorithms and for different item categories and demonstrate significant differences between model-based and memory-based algorithms.

IRAug 2, 2019
Bias Disparity in Collaborative Recommendation: Algorithmic Evaluation and Comparison

Masoud Mansoury, Bamshad Mobasher, Robin Burke et al.

Research on fairness in machine learning has been recently extended to recommender systems. One of the factors that may impact fairness is bias disparity, the degree to which a group's preferences on various item categories fail to be reflected in the recommendations they receive. In some cases biases in the original data may be amplified or reversed by the underlying recommendation algorithm. In this paper, we explore how different recommendation algorithms reflect the tradeoff between ranking quality and bias disparity. Our experiments include neighborhood-based, model-based, and trust-aware recommendation algorithms.

IRJul 31, 2019
The Unfairness of Popularity Bias in Recommendation

Himan Abdollahpouri, Masoud Mansoury, Robin Burke et al.

Recommender systems are known to suffer from the popularity bias problem: popular (i.e. frequently rated) items get a lot of exposure while less popular ones are under-represented in the recommendations. Research in this area has been mainly focusing on finding ways to tackle this issue by increasing the number of recommended long-tail items or otherwise the overall catalog coverage. In this paper, however, we look at this problem from the users' perspective: we want to see how popularity bias causes the recommendations to deviate from what the user expects to get from the recommender system. We define three different groups of users according to their interest in popular items (Niche, Diverse and Blockbuster-focused) and show the impact of popularity bias on the users in each group. Our experimental results on a movie dataset show that in many recommendation algorithms the recommendations the users get are extremely concentrated on popular items even if a user is interested in long-tail and non-popular items showing an extreme bias disparity.

IRJul 10, 2019
Flatter is better: Percentile Transformations for Recommender Systems

Masoud Mansoury, Robin Burke, Bamshad Mobasher

It is well known that explicit user ratings in recommender systems are biased towards high ratings, and that users differ significantly in their usage of the rating scale. Implementers usually compensate for these issues through rating normalization or the inclusion of a user bias term in factorization models. However, these methods adjust only for the central tendency of users' distributions. In this work, we demonstrate that lack of \textit{flatness} in rating distributions is negatively correlated with recommendation performance. We propose a rating transformation model that compensates for skew in the rating distribution as well as its central tendency by converting ratings into percentile values as a pre-processing step before recommendation generation. This transformation flattens the rating distribution, better compensates for differences in rating distributions, and improves recommendation performance. We also show a smoothed version of this transformation designed to yield more intuitive results for users with very narrow rating distributions. A comprehensive set of experiments show improved ranking performance for these percentile transformations with state-of-the-art recommendation algorithms in four real-world data sets.

SIMay 14, 2019
Power of the Few: Analyzing the Impact of Influential Users in Collaborative Recommender Systems

Farzad Eskandanian, Nasim Sonboli, Bamshad Mobasher

Like other social systems, in collaborative filtering a small number of "influential" users may have a large impact on the recommendations of other users, thus affecting the overall behavior of the system. Identifying influential users and studying their impact on other users is an important problem because it provides insight into how small groups can inadvertently or intentionally affect the behavior of the system as a whole. Modeling these influences can also shed light on patterns and relationships that would otherwise be difficult to discern, hopefully leading to more transparency in how the system generates personalized content. In this work we first formalize the notion of "influence" in collaborative filtering using an Influence Discrimination Model. We then empirically identify and characterize influential users and analyze their impact on the system under different underlying recommendation algorithms and across three different recommendation domains: job, movie and book recommendations. Insights from these experiments can help in designing systems that are not only optimized for accuracy, but are also tuned to mitigate the impact of influential users when it might lead to potential imbalance or unfairness in the system's outcomes.

IRMay 14, 2019
Modeling the Dynamics of User Preferences for Sequence-Aware Recommendation Using Hidden Markov Models

Farzad Eskandanian, Bamshad Mobasher

In a variety of online settings involving interaction with end-users it is critical for the systems to adapt to changes in user preferences. User preferences on items tend to change over time due to a variety of factors such as change in context, the task being performed, or other short-term or long-term external factors. Recommender systems need to be able to capture these dynamics in user preferences in order to remain tuned to the most current interests of users. In this work we present a recommendation framework which takes into account the dynamics of user preferences. We propose an approach based on Hidden Markov Models (HMM) to identify change-points in the sequence of user interactions which reflect significant changes in preference according to the sequential behavior of all the users in the data. The proposed framework leverages the identified change points to generate recommendations using a sequence-aware non-negative matrix factorization model. We empirically demonstrate the effectiveness of the HMM-based change detection method as compared to standard baseline methods. Additionally, we evaluate the performance of the proposed recommendation method and show that it compares favorably to state-of-the-art sequence-aware recommendation models.

IRJan 22, 2019
Managing Popularity Bias in Recommender Systems with Personalized Re-ranking

Himan Abdollahpouri, Robin Burke, Bamshad Mobasher

Many recommender systems suffer from popularity bias: popular items are recommended frequently while less popular, niche products, are recommended rarely or not at all. However, recommending the ignored products in the `long tail' is critical for businesses as they are less likely to be discovered. In this paper, we introduce a personalized diversification re-ranking approach to increase the representation of less popular items in recommendations while maintaining acceptable recommendation accuracy. Our approach is a post-processing step that can be applied to the output of any recommender system. We show that our approach is capable of managing popularity bias more effectively, compared with an existing method based on regularization. We also examine both new and existing metrics to measure the coverage of long-tail items in the recommendation.

IRSep 29, 2018
Detecting Changes in User Preferences using Hidden Markov Models for Sequential Recommendation Tasks

Farzad Eskandanian, Bamshad Mobasher

Recommender systems help users find relevant items of interest based on the past preferences of those users. In many domains, however, the tastes and preferences of users change over time due to a variety of factors and recommender systems should capture these dynamics in user preferences in order to remain tuned to the most current interests of users. In this work we present a recommendation framework based on Hidden Markov Models (HMM) which takes into account the dynamics of user preferences. We propose a HMM-based approach to change point detection in the sequence of user interactions which reflect significant changes in preference according to the sequential behavior of all the users in the data. The proposed framework leverages the identified change points to generate recommendations in two ways. In one approach change points are used to create a sequence-aware non-negative matrix factorization model to generate recommendations that are aligned with the current tastes of user. In the second approach the HMM is used directly to generate recommendations taking into account the identified change points. These models are evaluated in terms of accuracy of change point detection and also the effectiveness of recommendations using a real music streaming dataset.

IRFeb 15, 2018
Popularity-Aware Item Weighting for Long-Tail Recommendation

Himan Abdollahpouri, Robin Burke, Bamshad Mobasher

Many recommender systems suffer from the popularity bias problem: popular items are being recommended frequently while less popular, niche products, are recommended rarely if not at all. However, those ignored products are exactly the products that businesses need to find customers for and their recommendations would be more beneficial. In this paper, we examine an item weighting approach to improve long-tail recommendation. Our approach works as a simple yet powerful add-on to existing recommendation algorithms for making a tunable trade-off between accuracy and long-tail coverage.

IRFeb 28, 2017
Weighted Random Walk Sampling for Multi-Relational Recommendation

Fatemeh Vahedian, Robin Burke, Bamshad Mobasher

In the information overloaded web, personalized recommender systems are essential tools to help users find most relevant information. The most heavily-used recommendation frameworks assume user interactions that are characterized by a single relation. However, for many tasks, such as recommendation in social networks, user-item interactions must be modeled as a complex network of multiple relations, not only a single relation. Recently research on multi-relational factorization and hybrid recommender models has shown that using extended meta-paths to capture additional information about both users and items in the network can enhance the accuracy of recommendations in such networks. Most of this work is focused on unweighted heterogeneous networks, and to apply these techniques, weighted relations must be simplified into binary ones. However, information associated with weighted edges, such as user ratings, which may be crucial for recommendation, are lost in such binarization. In this paper, we explore a random walk sampling method in which the frequency of edge sampling is a function of edge weight, and apply this generate extended meta-paths in weighted heterogeneous networks. With this sampling technique, we demonstrate improved performance on multiple data sets both in terms of recommendation accuracy and model generation efficiency.