Towards Resolving Propensity Contradiction in Offline Recommender Learning
This work resolves a critical self-contradiction in propensity-based methods for recommender systems, benefiting researchers and practitioners dealing with biased feedback data.
The paper tackles the problem of selection bias in offline recommender learning by addressing the contradiction in inverse propensity score (IPS) methods, which require MCAR data to be effective despite aiming to work without it. The result is a novel algorithm that achieves superior performance in rating prediction and ranking metrics without needing true propensity information, as demonstrated in extensive experiments.
We study offline recommender learning from explicit rating feedback in the presence of selection bias. A current promising solution for the bias is the inverse propensity score (IPS) estimation. However, the performance of existing propensity-based methods can suffer significantly from the propensity estimation bias. In fact, most of the previous IPS-based methods require some amount of missing-completely-at-random (MCAR) data to accurately estimate the propensity. This leads to a critical self-contradiction; IPS is ineffective without MCAR data, even though it originally aims to learn recommenders from only missing-not-at-random feedback. To resolve this propensity contradiction, we derive a propensity-independent generalization error bound and propose a novel algorithm to minimize the theoretical bound via adversarial learning. Our theory and algorithm do not require a propensity estimation procedure, thereby leading to a well-performing rating predictor without the true propensity information. Extensive experiments demonstrate that the proposed approach is superior to a range of existing methods both in rating prediction and ranking metrics in practical settings without MCAR data.