IRAILGFeb 15, 2024

From Variability to Stability: Advancing RecSys Benchmarking Practices

arXiv:2402.09766v214 citationsh-index: 6KDD
Originality Incremental advance
AI Analysis

This addresses the issue of inconsistent evaluation practices in RecSys research, offering a standardized approach for researchers, though it is incremental in improving benchmarking rather than a paradigm shift.

The paper tackled the problem of unreliable benchmarking in recommender systems by introducing a novel methodology that uses 30 diverse datasets and evaluates 11 algorithms across 9 metrics, resulting in a validated strategy for fair and robust comparisons.

In the rapidly evolving domain of Recommender Systems (RecSys), new algorithms frequently claim state-of-the-art performance based on evaluations over a limited set of arbitrarily selected datasets. However, this approach may fail to holistically reflect their effectiveness due to the significant impact of dataset characteristics on algorithm performance. Addressing this deficiency, this paper introduces a novel benchmarking methodology to facilitate a fair and robust comparison of RecSys algorithms, thereby advancing evaluation practices. By utilizing a diverse set of $30$ open datasets, including two introduced in this work, and evaluating $11$ collaborative filtering algorithms across $9$ metrics, we critically examine the influence of dataset characteristics on algorithm performance. We further investigate the feasibility of aggregating outcomes from multiple datasets into a unified ranking. Through rigorous experimental analysis, we validate the reliability of our methodology under the variability of datasets, offering a benchmarking strategy that balances quality and computational demands. This methodology enables a fair yet effective means of evaluating RecSys algorithms, providing valuable guidance for future research endeavors.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes