IRAIDec 27, 2023

Performance Comparison of Session-based Recommendation Algorithms based on GNNs

arXiv:2312.16695v28 citationsh-index: 4ECIR
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of inconsistent evaluation in session-based recommendation research for the community, revealing that incremental improvements may be overstated.

The study compared eight recent GNN-based session-based recommendation algorithms under identical conditions and found that simple baseline models outperformed them in Mean Reciprocal Rank, with GNN models only winning in three cases for Hit Rate, highlighting issues in research methodology.

In session-based recommendation settings, a recommender system has no access to long-term user profiles and thus has to base its suggestions on the user interactions that are observed in an ongoing session. Since such sessions can consist of only a small set of interactions, various approaches based on Graph Neural Networks (GNN) were recently proposed, as they allow us to integrate various types of side information about the items in a natural way. Unfortunately, a variety of evaluation settings are used in the literature, e.g., in terms of protocols, metrics and baselines, making it difficult to assess what represents the state of the art. In this work, we present the results of an evaluation of eight recent GNN-based approaches that were published in high-quality outlets. For a fair comparison, all models are systematically tuned and tested under identical conditions using three common datasets. We furthermore include k-nearest-neighbor and sequential rules-based models as baselines, as such models have previously exhibited competitive performance results for similar settings. To our surprise, the evaluation showed that the simple models outperform all recent GNN models in terms of the Mean Reciprocal Rank, which we used as an optimization criterion, and were only outperformed in three cases in terms of the Hit Rate. Additional analyses furthermore reveal that several other factors that are often not deeply discussed in papers, e.g., random seeds, can markedly impact the performance of GNN-based models. Our results therefore (a) point to continuing issues in the community in terms of research methodology and (b) indicate that there is ample room for improvement in session-based recommendation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes