IRAILGSep 26, 2021

Deep Exploration for Recommendation Systems

arXiv:2109.12509v416 citations
Originality Highly original
AI Analysis

This addresses the challenge of sparse rewards and delayed feedback in recommendation systems, offering a novel approach for improving user engagement and system performance.

The paper tackles the problem of learning from delayed feedback in recommendation systems by developing deep exploration methods, which are shown to achieve large improvements over existing algorithms in experiments with industrial-grade simulators.

Modern recommendation systems ought to benefit by probing for and learning from delayed feedback. Research has tended to focus on learning from a user's response to a single recommendation. Such work, which leverages methods of supervised and bandit learning, forgoes learning from the user's subsequent behavior. Where past work has aimed to learn from subsequent behavior, there has been a lack of effective methods for probing to elicit informative delayed feedback. Effective exploration through probing for delayed feedback becomes particularly challenging when rewards are sparse. To address this, we develop deep exploration methods for recommendation systems. In particular, we formulate recommendation as a sequential decision problem and demonstrate benefits of deep exploration over single-step exploration. Our experiments are carried out with high-fidelity industrial-grade simulators and establish large improvements over existing algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes