IRAILGNov 18, 2021

Beyond NDCG: behavioral testing of recommender systems with RecList

arXiv:2111.09963v232 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the need for more nuanced, real-world testing in recommender systems for developers and researchers, though it is incremental as it builds on existing behavioral testing concepts.

The authors tackled the problem of evaluating recommender systems beyond traditional metrics like NDCG by proposing RecList, a behavioral testing methodology that organizes systems by use case and provides a plug-and-play procedure, resulting in an open-source package for analyzing algorithms and commercial systems.

As with most Machine Learning systems, recommender systems are typically evaluated through performance metrics computed over held-out data points. However, real-world behavior is undoubtedly nuanced: ad hoc error analysis and deployment-specific tests must be employed to ensure the desired quality in actual deployments. In this paper, we propose RecList, a behavioral-based testing methodology. RecList organizes recommender systems by use case and introduces a general plug-and-play procedure to scale up behavioral testing. We demonstrate its capabilities by analyzing known algorithms and black-box commercial systems, and we release RecList as an open source, extensible package for the community.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes