CYMar 6
Human, Algorithm, or Both? Gender Bias in Human-Augmented RecruitingMesut Kaya, Toine Bogers
Recent years have seen rapid growth in the market for HR technology and AI-driven HR solutions in particular. This popularity has also resulted in increased attention to the negative aspects of using AI to support hiring practices, such as the risk of reinforcing existing biases against vulnerable groups based on gender or other sensitive attributes. Combining human experience with AI efficiency in making recruiting and selection decisions has the potential to help mitigate these biases, but despite a considerable amount of research on fairness in algorithmic hiring, actual empirical evaluations comparing the fairness of human, AI, and human-augmented decision-making remain scarce. In this study, we address this gap by presenting a quantitative analysis of gender bias across three scenarios of a real-world recruitment platform: (1) recruiters searching a CV database manually for relevant candidates, (2) AI-driven matching between candidates and jobs, and (3) a combination of human and AI-driven recruiting. We find that human recruiters produce lists of candidates that are fairer in terms of gender than the AI-only solution, with more deliberation by humans resulting in fairer outcomes. However, the combination of human and AI-driven is more than the sum of its parts and produces the fairest candidate lists: interacting with the slate of recommended candidates first before manually searching for additional candidates has a beneficial effect on the gender fairness of the set of candidates that are viewed, clicked, and contacted afterwards. Our work provides one of the first empirical comparisons of fairness across human, AI, and hybrid recruiting processes, offering evidence to inform the development of more equitable hiring practices and highlighting the importance of human oversight for mitigating bias in algorithmic hiring.
IRDec 18, 2020
Recommenders with a mission: assessing diversity in newsrecommendationsSanne Vrijenhoek, Mesut Kaya, Nadia Metoui et al.
News recommenders help users to find relevant online content and have the potential to fulfill a crucial role in a democratic society, directing the scarce attention of citizens towards the information that is most important to them. Simultaneously, recent concerns about so-called filter bubbles, misinformation and selective exposure are symptomatic of the disruptive potential of these digital news recommenders. Recommender systems can make or break filter bubbles, and as such can be instrumental in creating either a more closed or a more open internet. Current approaches to evaluating recommender systems are often focused on measuring an increase in user clicks and short-term engagement, rather than measuring the user's longer term interest in diverse and important information. This paper aims to bridge the gap between normative notions of diversity, rooted in democratic theory, and quantitative metrics necessary for evaluating the recommender system. We propose a set of metrics grounded in social science interpretations of diversity and suggest ways for practical implementations.
IRSep 6, 2020
Contextual Personalized Re-Ranking of Music Recommendations through Audio FeaturesBoning Gong, Mesut Kaya, Nava Tintarev
Users are able to access millions of songs through music streaming services like Spotify, Pandora, and Deezer. Access to such large catalogs, created a need for relevant song recommendations. However, user preferences are highly subjective in nature and change according to context (e.g., music that is suitable in the morning is not as suitable in the evening). Moreover, the music one user may prefer in a given context may be different from what another user prefers in the same context (i.e., what is considered good morning music differs across users). Accurately representing these preferences is essential to creating accurate and effective song recommendations. User preferences for songs can be based on high level audio features, such as tempo and valence. In this paper, we therefore propose a contextual re-ranking algorithm, based on audio feature representations of user preferences in specific contextual conditions. We evaluate the performance of our re-ranking algorithm using the #NowPlaying-RS dataset, which exists of user listening events crawled from Twitter and is enriched with song audio features. We compare a global (context for all users) and personalized (context for each user) model based on these audio features. The global model creates an audio feature representation of each contextual condition based on the preferences of all users. Unlike the global model, the personalized model creates user-specific audio feature representations of contextual conditions, and is measured across 333 distinct users. We show that the personalized model outperforms the global model when evaluated using the precision and mean average precision metrics.
IRJul 31, 2019
Sudden Death: A New Way to Compare Recommendation DiversificationDerek Bridge, Mesut Kaya, Pablo Castells
This paper describes problems with the current way we compare the diversity of different recommendation lists in offline experiments. We illustrate the problems with a case study. We propose the Sudden Death score as a new and better way of making these comparisons.