Measuring "Why" in Recommender Systems: a Comprehensive Survey on the Evaluation of Explainable Recommendation
It addresses the need for systematic evaluation methods in explainable recommendation to improve transparency and user trust, but is incremental as it synthesizes existing work.
This paper tackles the problem of evaluating explanations in recommender systems by conducting a comprehensive survey of over 100 papers, analyzing and comparing various evaluation strategies to provide guidelines for selection.
Explainable recommendation has shown its great advantages for improving recommendation persuasiveness, user satisfaction, system transparency, among others. A fundamental problem of explainable recommendation is how to evaluate the explanations. In the past few years, various evaluation strategies have been proposed. However, they are scattered in different papers, and there lacks a systematic and detailed comparison between them. To bridge this gap, in this paper, we comprehensively review the previous work, and provide different taxonomies for them according to the evaluation perspectives and evaluation methods. Beyond summarizing the previous work, we also analyze the (dis)advantages of existing evaluation methods and provide a series of guidelines on how to select them. The contents of this survey are based on more than 100 papers from top-tier conferences like IJCAI, AAAI, TheWebConf, Recsys, UMAP, and IUI, and their complete summarization are presented at https://shimo.im/sheets/VKrpYTcwVH6KXgdy/MODOC/. With this survey, we finally aim to provide a clear and comprehensive review on the evaluation of explainable recommendation.