LGMLMay 4, 2025

Neural Logistic Bandits

arXiv:2505.02069v13 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses inefficiencies in bandit algorithms for high-dimensional neural network settings, offering incremental improvements in regret bounds.

The paper tackles the problem of neural logistic bandits by introducing a novel Bernstein-type inequality to reduce dependence on feature dimension and variance, resulting in algorithms with regret bounds of order ̃O(̃d√κT) and ̃O(̃d√T/κ), improving on prior work.

We study the problem of neural logistic bandits, where the main task is to learn an unknown reward function within a logistic link function using a neural network. Existing approaches either exhibit unfavorable dependencies on $κ$, where $1/κ$ represents the minimum variance of reward distributions, or suffer from direct dependence on the feature dimension $d$, which can be huge in neural network-based settings. In this work, we introduce a novel Bernstein-type inequality for self-normalized vector-valued martingales that is designed to bypass a direct dependence on the ambient dimension. This lets us deduce a regret upper bound that grows with the effective dimension $\widetilde{d}$, not the feature dimension, while keeping a minimal dependence on $κ$. Based on the concentration inequality, we propose two algorithms, NeuralLog-UCB-1 and NeuralLog-UCB-2, that guarantee regret upper bounds of order $\widetilde{O}(\widetilde{d}\sqrt{κT})$ and $\widetilde{O}(\widetilde{d}\sqrt{T/κ})$, respectively, improving on the existing results. Lastly, we report numerical results on both synthetic and real datasets to validate our theoretical findings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes