Overcoming Prior Misspecification in Online Learning to Rank
This addresses a key limitation in Bayesian ranking bandits for online learning to rank, though it appears incremental by building on existing prior-based methods.
The paper tackles the problem of prior misspecification in online learning to rank, proposing adaptive algorithms that handle mismatched priors and extend to linear and generalized linear models, with experiments showing efficacy on synthetic and real-world data.
The recent literature on online learning to rank (LTR) has established the utility of prior knowledge to Bayesian ranking bandit algorithms. However, a major limitation of existing work is the requirement for the prior used by the algorithm to match the true prior. In this paper, we propose and analyze adaptive algorithms that address this issue and additionally extend these results to the linear and generalized linear models. We also consider scalar relevance feedback on top of click feedback. Moreover, we demonstrate the efficacy of our algorithms using both synthetic and real-world experiments.