From Clicks to Conversions: Recommendation for long-term reward
This work addresses the challenge of optimizing recommender systems for long-term business outcomes, which is important for companies seeking to improve metrics like sales and retention, though it appears incremental as it builds on existing simulation environments.
The paper tackles the problem of recommender systems being optimized for short-term rewards like clicks, ignoring long-term business metrics such as sales or retention. It introduces a framework for modeling long-term rewards in the RecoGym simulation environment, showcasing issues with last-click attribution and proposing a simple extension that achieves state-of-the-art results.
Recommender systems are often optimised for short-term reward: a recommendation is considered successful if a reward (e.g. a click) can be observed immediately after the recommendation. The advantage of this framework is that with some reasonable (although questionable) assumptions, it allows familiar supervised learning tools to be used for the recommendation task. However, it means that long-term business metrics, e.g. sales or retention are ignored. In this paper we introduce a framework for modeling long-term rewards in the RecoGym simulation environment. We use this newly introduced functionality to showcase problems introduced by the last-click attribution scheme in the case of conversion-optimized recommendations and propose a simple extension that leads to state-of-the-art results.