Evaluation on Entity Matching in Recommender Systems
This work addresses a bottleneck for researchers in conversational and knowledge-based recommender systems by providing a dataset and evaluation benchmark, though it is incremental as it focuses on data creation rather than method innovation.
The paper tackles the lack of rigorous evaluation frameworks for cross-dataset entity matching in recommender systems by introducing Reddit-Amazon-EM, a manually annotated dataset linking movies from Reddit and Amazon, and evaluates state-of-the-art methods, with the best-performing method providing a mapping for future research.
Entity matching is a crucial component in various recommender systems, including conversational recommender systems (CRS) and knowledge-based recommender systems. However, the lack of rigorous evaluation frameworks for cross-dataset entity matching impedes progress in areas such as LLM-driven conversational recommendations and knowledge-grounded dataset construction. In this paper, we introduce Reddit-Amazon-EM, a novel dataset comprising naturally occurring items from Reddit and the Amazon '23 dataset. Through careful manual annotation, we identify corresponding movies across Reddit-Movies and Amazon'23, two existing recommender system datasets with inherently overlapping catalogs. Leveraging Reddit-Amazon-EM, we conduct a comprehensive evaluation of state-of-the-art entity matching methods, including rule-based, graph-based, lexical-based, embedding-based, and LLM-based approaches. For reproducible research, we release our manually annotated entity matching gold set and provide the mapping between the two datasets using the best-performing method from our experiments. This serves as a valuable resource for advancing future work on entity matching in recommender systems.Data and Code are accessible at: https://github.com/huang-zihan/Reddit-Amazon-Entity-Matching.