LGOCMLOct 2, 2023

Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits

arXiv:2310.00968v222 citationsh-index: 18
Originality Incremental advance
AI Analysis

This work addresses a gap in decision-making with preferential feedback for applications like ranking and recommendation systems, though it is incremental as it builds on existing dueling bandits frameworks.

The paper tackles the problem of stochastic contextual dueling bandits by proposing a new algorithm that achieves a variance-aware regret bound of ̃O(d√(Σσ_t^2) + d), where σ_t is the variance in pairwise comparisons, improving over previous methods that ignore this uncertainty.

Dueling bandits is a prominent framework for decision-making involving preferential feedback, a valuable feature that fits various applications involving human interaction, such as ranking, information retrieval, and recommendation systems. While substantial efforts have been made to minimize the cumulative regret in dueling bandits, a notable gap in the current research is the absence of regret bounds that account for the inherent uncertainty in pairwise comparisons between the dueling arms. Intuitively, greater uncertainty suggests a higher level of difficulty in the problem. To bridge this gap, this paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM). We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $\tilde O\big(d\sqrt{\sum_{t=1}^Tσ_t^2} + d\big)$, where $σ_t$ is the variance of the pairwise comparison in round $t$, $d$ is the dimension of the context vectors, and $T$ is the time horizon. Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $\tilde O(d)$ regret. We perform empirical experiments on synthetic data to confirm the advantage of our method over previous variance-agnostic algorithms.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes