Assessing Fashion Recommendations: A Multifaceted Offline Evaluation Approach
This work addresses the need for better evaluation strategies in fashion recommendation, which is incremental as it adapts existing methods to a specific domain.
The paper tackled the problem of evaluating fashion recommender systems by proposing a multifaceted offline evaluation approach that includes multiple metrics and user segments, and found that only by considering performance across these dimensions can algorithms' suitability for fashion users be determined.
Fashion is a unique domain for developing recommender systems (RS). Personalization is critical to fashion users. As a result, highly accurate recommendations are not sufficient unless they are also specific to users. Moreover, fashion data is characterized by a large majority of new users, so a recommendation strategy that performs well only for users with prior interaction history is a poor fit to the fashion problem. Critical to addressing these issues in fashion recommendation is an evaluation strategy that: 1) includes multiple metrics that are relevant to fashion, and 2) is performed within segments of users with different interaction histories. Here, we present our multifaceted offline strategy for evaluating fashion RS. Using our proposed evaluation methodology, we compare the performance of three different algorithms, a most popular (MP) items strategy, a collaborative filtering (CF) strategy, and a content-based (CB) strategy. We demonstrate that only by considering the performance of these algorithms across multiple metrics and user segments can we determine the extent to which each algorithm is likely to fulfill fashion users' needs.