Unbiased Learning to Rank with Biased Continuous Feedback
This addresses a practical limitation in industrial recommender systems that need to model both categorical and continuous biased feedback like click and dwell time.
The paper tackles the problem of learning unbiased ranking models from biased continuous feedback, which existing unbiased learning-to-rank methods cannot properly handle. The proposed method achieves superior results for continuous labels and competitive performance for categorical labels on both public benchmarks and internal live traffic at Tencent News.
It is a well-known challenge to learn an unbiased ranker with biased feedback. Unbiased learning-to-rank(LTR) algorithms, which are verified to model the relative relevance accurately based on noisy feedback, are appealing candidates and have already been applied in many applications with single categorical labels, such as user click signals. Nevertheless, the existing unbiased LTR methods cannot properly handle continuous feedback, which are essential for many industrial applications, such as content recommender systems. To provide personalized high-quality recommendation results, recommender systems need model both categorical and continuous biased feedback, such as click and dwell time. Accordingly, we design a novel unbiased LTR algorithm to tackle the challenges, which innovatively models position bias in the pairwise fashion and introduces the pairwise trust bias to separate the position bias, trust bias, and user relevance explicitly and can work for both continuous and categorical feedback. Experiment results on public benchmark datasets and internal live traffic of a large-scale recommender system at Tencent News show superior results for continuous labels and also competitive performance for categorical labels of the proposed method.