LG AI CL IROct 17, 2024

Disentangling Likes and Dislikes in Personalized Generative Explainable Recommendation

Ryotaro Shimizu, Takashi Wada, Yu Wang, Johannes Kruse, Sean O'Brien, Sai HtaungKham, Linxin Song, Yuya Yoshikawa, Yuki Saito, Fugee Tsung, Masayuki Goto, Julian McAuley

arXiv:2410.13248v27.97 citationsh-index: 49Has CodeWWW

Originality Incremental advance

AI Analysis

This addresses a crucial gap in explainable recommendation for users by highlighting sentiment accuracy, though it is incremental as it builds on existing frameworks with new evaluation.

The paper tackles the problem that explainable recommendation systems often fail to reflect users' post-purchase sentiments, and it introduces new datasets and evaluation methods focusing on sentiment alignment, showing that existing models perform poorly on this metric but improve when user ratings are provided as input.

Recent research on explainable recommendation generally frames the task as a standard text generation problem, and evaluates models simply based on the textual similarity between the predicted and ground-truth explanations. However, this approach fails to consider one crucial aspect of the systems: whether their outputs accurately reflect the users' (post-purchase) sentiments, i.e., whether and why they would like and/or dislike the recommended items. To shed light on this issue, we introduce new datasets and evaluation methods that focus on the users' sentiments. Specifically, we construct the datasets by explicitly extracting users' positive and negative opinions from their post-purchase reviews using an LLM, and propose to evaluate systems based on whether the generated explanations 1) align well with the users' sentiments, and 2) accurately identify both positive and negative opinions of users on the target items. We benchmark several recent models on our datasets and demonstrate that achieving strong performance on existing metrics does not ensure that the generated explanations align well with the users' sentiments. Lastly, we find that existing models can provide more sentiment-aware explanations when the users' (predicted) ratings for the target items are directly fed into the models as input. The datasets and benchmark implementation are available at: https://github.com/jchanxtarov/sent_xrec.

View on arXiv PDF Code

Similar