Recommendation Is a Dish Better Served Warm
This addresses a methodological inconsistency for researchers and practitioners in recommender systems, but it is incremental as it focuses on threshold analysis rather than proposing a new solution.
The paper tackles the problem of arbitrary and inconsistent cold-start thresholds in recommender systems, which affect evaluation comparability, and finds that such inconsistencies can lead to data removal or misclassification, introducing noise.
In modern recommender systems, experimental settings typically include filtering out cold users and items based on a minimum interaction threshold. However, these thresholds are often chosen arbitrarily and vary widely across studies, leading to inconsistencies that can significantly affect the comparability and reliability of evaluation results. In this paper, we systematically explore the cold-start boundary by examining the criteria used to determine whether a user or an item should be considered cold. Our experiments incrementally vary the number of interactions for different items during training, and gradually update the length of user interaction histories during inference. We investigate the thresholds across several widely used datasets, commonly represented in recent papers from top-tier conferences, and on multiple established recommender baselines. Our findings show that inconsistent selection of cold-start thresholds can either result in the unnecessary removal of valuable data or lead to the misclassification of cold instances as warm, introducing more noise into the system.