Statistical Inference: The Missing Piece of RecSys Experiment Reliability Discourse
This addresses a methodological gap for researchers and practitioners in recommender systems, but it is incremental as it highlights an existing issue rather than introducing new techniques.
The paper identifies a lack of statistical inference in recommender system evaluations, arguing it is a key missing component, and supports this with a systematic review of recent papers and a survey of related work in information retrieval.
This paper calls attention to the missing component of the recommender system evaluation process: Statistical Inference. There is active research in several components of the recommender system evaluation process: selecting baselines, standardizing benchmarks, and target item sampling. However, there has not yet been significant work on the role and use of statistical inference for analyzing recommender system evaluation results. In this paper, we argue that the use of statistical inference is a key component of the evaluation process that has not been given sufficient attention. We support this argument with systematic review of recent RecSys papers to understand how statistical inference is currently being used, along with a brief survey of studies that have been done on the use of statistical inference in the information retrieval community. We present several challenges that exist for inference in recommendation experiment which buttresses the need for empirical studies to aid with appropriately selecting and applying statistical inference techniques.