LG IRDec 22, 2022

Local Policy Improvement for Recommender Systems

arXiv:2212.11431v210.410 citationsh-index: 46

Originality Incremental advance

AI Analysis

This addresses the challenge of frequent policy updates in recommender systems for improving user engagement, though it is incremental as it builds on existing policy optimization approaches.

The paper tackles the policy mismatch problem in recommender systems, where training a new policy relies on data from a previously-deployed policy, by proposing a local policy improvement method that optimizes a lower bound of expected reward without off-policy correction, achieving competitive performance in sequential recommendation settings.

Recommender systems predict what items a user will interact with next, based on their past interactions. The problem is often approached through supervised learning, but recent advancements have shifted towards policy optimization of rewards (e.g., user engagement). One challenge with the latter is policy mismatch: we are only able to train a new policy given data collected from a previously-deployed policy. The conventional way to address this problem is through importance sampling correction, but this comes with practical limitations. We suggest an alternative approach of local policy improvement without off-policy correction. Our method computes and optimizes a lower bound of expected reward of the target policy, which is easy to estimate from data and does not involve density ratios (such as those appearing in importance sampling correction). This local policy improvement paradigm is ideal for recommender systems, as previous policies are typically of decent quality and policies are updated frequently. We provide empirical evidence and practical recipes for applying our technique in a sequential recommendation setting.

View on arXiv PDF

Similar