LG AI MLOct 12, 2025

Provable Anytime Ensemble Sampling Algorithms in Nonlinear Contextual Bandits

arXiv:2510.10730v11 citationsh-index: 3

Originality Incremental advance

AI Analysis

This provides a provable and practical randomized exploration approach for nonlinear contextual bandits, addressing challenges in settings like generalized linear and neural models, though it is incremental as it builds on existing ensemble sampling methods.

The paper tackles the problem of ensemble sampling in nonlinear contextual bandits by developing algorithms like GLM-ES and Neural-ES, achieving regret bounds of O(d^{3/2} sqrt(T) + d^{9/2}) and O(tilde{d} sqrt(T)) respectively, which match state-of-the-art results.

We provide a unified algorithmic framework for ensemble sampling in nonlinear contextual bandits and develop corresponding regret bounds for two most common nonlinear contextual bandit settings: Generalized Linear Ensemble Sampling (\texttt{GLM-ES}) for generalized linear bandits and Neural Ensemble Sampling (\texttt{Neural-ES}) for neural contextual bandits. Both methods maintain multiple estimators for the reward model parameters via maximum likelihood estimation on randomly perturbed data. We prove high-probability frequentist regret bounds of $\mathcal{O}(d^{3/2} \sqrt{T} + d^{9/2})$ for \texttt{GLM-ES} and $\mathcal{O}(\widetilde{d} \sqrt{T})$ for \texttt{Neural-ES}, where $d$ is the dimension of feature vectors, $\widetilde{d}$ is the effective dimension of a neural tangent kernel matrix, and $T$ is the number of rounds. These regret bounds match the state-of-the-art results of randomized exploration algorithms in nonlinear contextual bandit settings. In the theoretical analysis, we introduce techniques that address challenges specific to nonlinear models. Practically, we remove fixed-time horizon assumptions by developing anytime versions of our algorithms, suitable when $T$ is unknown. Finally, we empirically evaluate \texttt{GLM-ES}, \texttt{Neural-ES}, and their anytime variants, demonstrating strong performance. Overall, our results establish ensemble sampling as a provable and practical randomized exploration approach for nonlinear contextual bandits.

View on arXiv PDF

Similar