LGJun 3, 2025

Optimization of Epsilon-Greedy Exploration

arXiv:2506.03324v12 citationsh-index: 37
Originality Incremental advance
AI Analysis

This work addresses the exploration-exploitation tradeoff in recommendation systems, offering a principled optimization method that is incremental over existing heuristics.

The paper tackles the problem of optimizing exploration rates in epsilon-greedy recommendation systems by proposing a framework that minimizes Bayesian regret using stochastic gradient descent and Model-Predictive Control, demonstrating that it matches or outperforms heuristics across various settings.

Modern recommendation systems rely on exploration to learn user preferences for new items, typically implementing uniform exploration policies (e.g., epsilon-greedy) due to their simplicity and compatibility with machine learning (ML) personalization models. Within these systems, a crucial consideration is the rate of exploration - what fraction of user traffic should receive random item recommendations and how this should evolve over time. While various heuristics exist for navigating the resulting exploration-exploitation tradeoff, selecting optimal exploration rates is complicated by practical constraints including batched updates, time-varying user traffic, short time horizons, and minimum exploration requirements. In this work, we propose a principled framework for determining the exploration schedule based on directly minimizing Bayesian regret through stochastic gradient descent (SGD), allowing for dynamic exploration rate adjustment via Model-Predictive Control (MPC). Through extensive experiments with recommendation datasets, we demonstrate that variations in the batch size across periods significantly influence the optimal exploration strategy. Our optimization methods automatically calibrate exploration to the specific problem setting, consistently matching or outperforming the best heuristic for each setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes