IRJan 21, 2024
What Are We Optimizing For? A Human-centric Evaluation of Deep Learning-based Movie RecommendersRuixuan Sun, Xinyi Wu, Avinash Akella et al.
In the past decade, deep learning (DL) models have gained prominence for their exceptional accuracy on benchmark datasets in recommender systems (RecSys). However, their evaluation has primarily relied on offline metrics, overlooking direct user perception and experience. To address this gap, we conduct a human-centric evaluation case study of four leading DL-RecSys models in the movie domain. We test how different DL-RecSys models perform in personalized recommendation generation by conducting survey study with 445 real active users. We find some DL-RecSys models to be superior in recommending novel and unexpected items and weaker in diversity, trustworthiness, transparency, accuracy, and overall user satisfaction compared to classic collaborative filtering (CF) methods. To further explain the reasons behind the underperformance, we apply a comprehensive path analysis. We discover that the lack of diversity and too much serendipity from DL models can negatively impact the consequent perceived transparency and personalization of recommendations. Such a path ultimately leads to lower summative user satisfaction. Qualitatively, we confirm with real user quotes that accuracy plus at least one other attribute is necessary to ensure a good user experience, while their demands for transparency and trust can not be neglected. Based on our findings, we discuss future human-centric DL-RecSys design and optimization strategies.
IROct 3, 2013
Differential Data Analysis for Recommender SystemsRichard Chow, Hongxia Jin, Bart Knijnenburg et al.
We present techniques to characterize which data is important to a recommender system and which is not. Important data is data that contributes most to the accuracy of the recommendation algorithm, while less important data contributes less to the accuracy or even decreases it. Characterizing the importance of data has two potential direct benefits: (1) increased privacy and (2) reduced data management costs, including storage. For privacy, we enable increased recommendation accuracy for comparable privacy levels using existing data obfuscation techniques. For storage, our results indicate that we can achieve large reductions in recommendation data and yet maintain recommendation accuracy. Our main technique is called differential data analysis. The name is inspired by other sorts of differential analysis, such as differential power analysis and differential cryptanalysis, where insight comes through analysis of slightly differing inputs. In differential data analysis we chunk the data and compare results in the presence or absence of each chunk. We present results applying differential data analysis to two datasets and three different kinds of attributes. The first attribute is called user hardship. This is a novel attribute, particularly relevant to location datasets, that indicates how burdensome a data point was to achieve. The second and third attributes are more standard: timestamp and user rating. For user rating, we confirm previous work concerning the increased importance to the recommender of data corresponding to high and low user ratings.