Understanding the Effects of the Baidu-ULTR Logging Policy on Two-Tower Models
This work addresses a potential limitation in industry applications of two-tower models for unbiased learning to rank, though it is incremental as it tests existing concerns on a new dataset.
The paper investigates the logging policy confounding problem in two-tower models for unbiased learning to rank using the Baidu-ULTR dataset, finding that the confounding conditions exist but do not significantly affect model performance, and identifies a mismatch between expert annotations and user clicks.
Despite the popularity of the two-tower model for unbiased learning to rank (ULTR) tasks, recent work suggests that it suffers from a major limitation that could lead to its collapse in industry applications: the problem of logging policy confounding. Several potential solutions have even been proposed; however, the evaluation of these methods was mostly conducted using semi-synthetic simulation experiments. This paper bridges the gap between theory and practice by investigating the confounding problem on the largest real-world dataset, Baidu-ULTR. Our main contributions are threefold: 1) we show that the conditions for the confounding problem are given on Baidu-ULTR, 2) the confounding problem bears no significant effect on the two-tower model, and 3) we point to a potential mismatch between expert annotations, the golden standard in ULTR, and user click behavior.