LG MLMay 4, 2025

Neural Logistic Bandits

arXiv:2505.02069v13 citationsh-index: 1

Originality Incremental advance

AI Analysis

This addresses inefficiencies in bandit algorithms for high-dimensional neural network settings, offering incremental improvements in regret bounds.

The paper tackles the problem of neural logistic bandits by introducing a novel Bernstein-type inequality to reduce dependence on feature dimension and variance, resulting in algorithms with regret bounds of order ̃O(̃d√κT) and ̃O(̃d√T/κ), improving on prior work.

We study the problem of neural logistic bandits, where the main task is to learn an unknown reward function within a logistic link function using a neural network. Existing approaches either exhibit unfavorable dependencies on $κ$, where $1/κ$ represents the minimum variance of reward distributions, or suffer from direct dependence on the feature dimension $d$, which can be huge in neural network-based settings. In this work, we introduce a novel Bernstein-type inequality for self-normalized vector-valued martingales that is designed to bypass a direct dependence on the ambient dimension. This lets us deduce a regret upper bound that grows with the effective dimension $\widetilde{d}$, not the feature dimension, while keeping a minimal dependence on $κ$. Based on the concentration inequality, we propose two algorithms, NeuralLog-UCB-1 and NeuralLog-UCB-2, that guarantee regret upper bounds of order $\widetilde{O}(\widetilde{d}\sqrt{κT})$ and $\widetilde{O}(\widetilde{d}\sqrt{T/κ})$, respectively, improving on the existing results. Lastly, we report numerical results on both synthetic and real datasets to validate our theoretical findings.

View on arXiv PDF

Similar