60.8CRApr 16
Privacy Filters are Captured by Residues: A Characterization of Free Natural Filters and the Cost of AdaptivityMatthew Regehr, Bingshan Hu, Ethan Leeman et al.
We study privacy filters, which enable privacy accounting for differentially private (DP) mechanisms with adaptively chosen privacy characteristics. We develop a general theory that characterizes the worst-case privacy loss of an interaction involving an analyst that respects some restrictions on what queries they may issue. We apply this theory to develop residue filters, which unifies existing privacy filters. We develop the Gaussian DP (GDP) residue filter, which strictly improves upon the naïve GDP filter. We also show that residue filters capture the natural filter, which promises greater utility by leveraging exact privacy accounting techniques. Earlier privacy filters consider only simple privacy parameters such as Rényi-DP or GDP parameters. Natural filters account for the entire privacy profile of every query, promising more efficient use of a given privacy budget. We show that, contrary to other forms of DP, natural privacy filters are not free in general. We present a characterization of when a family of private queries admits free natural filters for a given budget. In particular, only families of privacy mechanisms that are totally-ordered when composed admit free natural privacy filters with respect to an arbitrary privacy budget. Finally, we show that, while the natural approximate-DP filter can fail in the presence of adaptive adversary, it cannot fail too badly: the output remains approximate-DP with parameters at most poly-logarithmically worse than the intended privacy parameters.
33.0LGMay 10Code
First Worst-Case Regret Bounds for Combinatorial Thompson Sampling in Sleeping Semi-BanditsZhiming Huang, Bingshan Hu, Jianping Pan
We revisit combinatorial Thompson sampling (CTS) for semi-bandits with sleeping arms, where arm availability varies over time and actions must satisfy combinatorial constraints, as in wireless mesh routing with fluctuating link availability. Despite its practical relevance, CTS has been hindered by several long-standing problems: (i) the absence of worst-case regret guarantees in the semi-bandit setting even without sleeping arms, (ii) the lack of theory under adversarially varying availability, and (iii) the consistently weak empirical performance of CTS with Gaussian priors (CTS-G). This paper resolves these long-standing issues by providing the first worst-case regret analysis of CTS-G, proving an upper bound of $\tilde{O}(m\sqrt{NT})$ and a matching lower bound of $\tildeΩ(m\sqrt{NT})$. To bridge the gap between theory and practice, we further propose CL-SG, a simple CTS-G variant that samples a single shared Gaussian seed each round to coordinate exploration across arms. We show that CL-SG achieves an improved regret bound of $\tilde{O}(\sqrt{mNT})$, together with a matching lower bound $Ω(\sqrt{mNT})$. Experiments on real-world datasets demonstrate that CL-SG consistently outperforms strong baselines including CTS-G and CTS-B, and we open-source our implementation for reproducibility.
LGJun 11, 2025
Efficient kernelized bandit algorithms via exploration distributionsBingshan Hu, Zheng He, Danica J. Sutherland
We consider a kernelized bandit problem with a compact arm set ${X} \subset \mathbb{R}^d $ and a fixed but unknown reward function $f^*$ with a finite norm in some Reproducing Kernel Hilbert Space (RKHS). We propose a class of computationally efficient kernelized bandit algorithms, which we call GP-Generic, based on a novel concept: exploration distributions. This class of algorithms includes Upper Confidence Bound-based approaches as a special case, but also allows for a variety of randomized algorithms. With careful choice of exploration distribution, our proposed generic algorithm realizes a wide range of concrete algorithms that achieve $\tilde{O}(γ_T\sqrt{T})$ regret bounds, where $γ_T$ characterizes the RKHS complexity. This matches known results for UCB- and Thompson Sampling-based algorithms; we also show that in practice, randomization can yield better practical results.
LGMay 5, 2025
Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and RegretBingshan Hu, Zhiming Huang, Tianyue H. Zhang et al. · mila
We address differentially private stochastic bandit problems from the angles of exploring the deep connections among Thompson Sampling with Gaussian priors, Gaussian mechanisms, and Gaussian differential privacy (GDP). We propose DP-TS-UCB, a novel parametrized private bandit algorithm that enables to trade off privacy and regret. DP-TS-UCB satisfies $ \tilde{O} \left(T^{0.25(1-α)}\right)$-GDP and enjoys an $O \left(K\ln^{α+1}(T)/Δ\right)$ regret bound, where $α\in [0,1]$ controls the trade-off between privacy and regret. Theoretically, our DP-TS-UCB relies on anti-concentration bounds of Gaussian distributions and links exploration mechanisms in Thompson Sampling-based algorithms and Upper Confidence Bound-based algorithms, which may be of independent interest.
LGMay 2, 2024
Efficient and Adaptive Posterior Sampling Algorithms for BanditsBingshan Hu, Zhiming Huang, Tianyue H. Zhang et al. · mila
We study Thompson Sampling-based algorithms for stochastic bandits with bounded rewards. As the existing problem-dependent regret bound for Thompson Sampling with Gaussian priors [Agrawal and Goyal, 2017] is vacuous when $T \le 288 e^{64}$, we derive a more practical bound that tightens the coefficient of the leading term %from $288 e^{64}$ to $1270$. Additionally, motivated by large-scale real-world applications that require scalability, adaptive computational resource allocation, and a balance in utility and computation, we propose two parameterized Thompson Sampling-based algorithms: Thompson Sampling with Model Aggregation (TS-MA-$α$) and Thompson Sampling with Timestamp Duelling (TS-TD-$α$), where $α\in [0,1]$ controls the trade-off between utility and computation. Both algorithms achieve $O \left(K\ln^{α+1}(T)/Δ\right)$ regret bound, where $K$ is the number of arms, $T$ is the finite learning horizon, and $Δ$ denotes the single round performance loss when pulling a sub-optimal arm.
LGFeb 16, 2021
Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic EnvironmentBingshan Hu, Zhiming Huang, Nishant A. Mehta et al.
In this paper, we study differentially private online learning problems in a stochastic environment under both bandit and full information feedback. For differentially private stochastic bandits, we propose both UCB and Thompson Sampling-based algorithms that are anytime and achieve the optimal $O \left(\sum_{j: Δ_j>0} \frac{\ln(T)}{\min \left\{Δ_j, ε\right\}} \right)$ instance-dependent regret bound, where $T$ is the finite learning horizon, $Δ_j$ denotes the suboptimality gap between the optimal arm and a suboptimal arm $j$, and $ε$ is the required privacy parameter. For the differentially private full information setting with stochastic rewards, we show an $Ω\left(\frac{\ln(K)}{\min \left\{Δ_{\min}, ε\right\}} \right)$ instance-dependent regret lower bound and an $Ω\left(\sqrt{T\ln(K)} + \frac{\ln(K)}ε\right)$ minimax lower bound, where $K$ is the total number of actions and $Δ_{\min}$ denotes the minimum suboptimality gap among all the suboptimal actions. For the same differentially private full information setting, we also present an $ε$-differentially private algorithm whose instance-dependent regret and worst-case regret match our respective lower bounds up to an extra $\log(T)$ factor.
LGMay 14, 2020
Thompson Sampling for Combinatorial Semi-bandits with Sleeping Arms and Long-Term Fairness ConstraintsZhiming Huang, Yifan Xu, Bingshan Hu et al.
We study the combinatorial sleeping multi-armed semi-bandit problem with long-term fairness constraints~(CSMAB-F). To address the problem, we adopt Thompson Sampling~(TS) to maximize the total rewards and use virtual queue techniques to handle the fairness constraints, and design an algorithm called \emph{TS with beta priors and Bernoulli likelihoods for CSMAB-F~(TSCSF-B)}. Further, we prove TSCSF-B can satisfy the fairness constraints, and the time-averaged regret is upper bounded by $\frac{N}{2η} + O\left(\frac{\sqrt{mNT\ln T}}{T}\right)$, where $N$ is the total number of arms, $m$ is the maximum number of arms that can be pulled simultaneously in each round~(the cardinality constraint) and $η$ is the parameter trading off fairness for rewards. By relaxing the fairness constraints (i.e., let $η\rightarrow \infty$), the bound boils down to the first problem-independent bound of TS algorithms for combinatorial sleeping multi-armed semi-bandit problems. Finally, we perform numerical experiments and use a high-rating movie recommendation application to show the effectiveness and efficiency of the proposed algorithm.