Optimizing an Utility Function for Exploration / Exploitation Trade-off in Context-Aware Recommender System
This work addresses the exploration/exploitation trade-off for contextual recommender systems, which is an incremental improvement over existing methods.
The paper tackled the problem of balancing exploration and exploitation in contextual recommender systems by developing a dynamic strategy that optimizes a utility function based on reward distributions. The result was that their algorithms outperformed existing ones in click-through-rate (CTR) in evaluations using real event log data.
In this paper, we develop a dynamic exploration/ exploitation (exr/exp) strategy for contextual recommender systems (CRS). Specifically, our methods can adaptively balance the two aspects of exr/exp by automatically learning the optimal tradeoff. This consists of optimizing a utility function represented by a linearized form of the probability distributions of the rewards of the clicked and the non-clicked documents already recommended. Within an offline simulation framework we apply our algorithms to a CRS and conduct an evaluation with real event log data. The experimental results and detailed analysis demonstrate that our algorithms outperform existing algorithms in terms of click-through-rate (CTR).