Joseph Lazzaro

ML
h-index9
4papers
4citations
Novelty53%
AI Score49

4 Papers

59.5MLApr 27
A Finite Time Analysis of Thompson Sampling for Bayesian Optimization with Preferential Feedback

Joseph Lazzaro, Davide Buffelli, Da-shan Shiu et al.

Preference feedback, in the form of pairwise comparisons rather than scalar scores, has seen increasing use in applications such as human-, laboratory-, and expert-in-the-loop design, as well as scientific discovery. We propose a Thompson Sampling (TS) approach to Bayesian optimization with preferential feedback that models comparisons using a monotone link on latent utility differences and leverages the dueling kernel induced by a base kernel. We provide a finite-time analysis showing that the performance of the proposed method matches that of standard TS for conventional Bayesian optimization with scalar feedback. The analysis exploits the anchor invariance of TS for challenger selection and introduces a double-TS pairing variant. We also demonstrate the performance of the method on both synthetic and real-world examples.

MLJan 22, 2025
Fixed-Budget Change Point Identification in Piecewise Constant Bandits

Joseph Lazzaro, Ciara Pike-Burke

We study the piecewise constant bandit problem where the expected reward is a piecewise constant function with one change point (discontinuity) across the action space $[0,1]$ and the learner's aim is to locate the change point. Under the assumption of a fixed exploration budget, we provide the first non-asymptotic analysis of policies designed to locate abrupt changes in the mean reward function under bandit feedback. We study the problem under a large and small budget regime, and for both settings establish lower bounds on the error probability and provide algorithms with near matching upper bounds. Interestingly, our results show a separation in the complexity of the two regimes. We then propose a regime adaptive algorithm which is near optimal for both small and large budgets simultaneously. We complement our theoretical analysis with experimental results in simulated environments to support our findings.

LGJul 30, 2025
Locally Differentially Private Thresholding Bandits

Annalisa Barbara, Joseph Lazzaro, Ciara Pike-Burke

This work investigates the impact of ensuring local differential privacy in the thresholding bandit problem. We consider both the fixed budget and fixed confidence settings. We propose methods that utilize private responses, obtained through a Bernoulli-based differentially private mechanism, to identify arms with expected rewards exceeding a predefined threshold. We show that this procedure provides strong privacy guarantees and derive theoretical performance bounds on the proposed algorithms. Additionally, we present general lower bounds that characterize the additional loss incurred by any differentially private mechanism, and show that the presented algorithms match these lower bounds up to poly-logarithmic factors. Our results provide valuable insights into privacy-preserving decision-making frameworks in bandit problems.

MLJul 11, 2025
Fixed-Confidence Multiple Change Point Identification under Bandit Feedback

Joseph Lazzaro, Ciara Pike-Burke

Piecewise constant functions describe a variety of real-world phenomena in domains ranging from chemistry to manufacturing. In practice, it is often required to confidently identify the locations of the abrupt changes in these functions as quickly as possible. For this, we introduce a fixed-confidence piecewise constant bandit problem. Here, we sequentially query points in the domain and receive noisy evaluations of the function under bandit feedback. We provide instance-dependent lower bounds for the complexity of change point identification in this problem. These lower bounds illustrate that an optimal method should focus its sampling efforts adjacent to each of the change points, and the number of samples around each change point should be inversely proportional to the magnitude of the change. Building on this, we devise a simple and computationally efficient variant of Track-and-Stop and prove that it is asymptotically optimal in many regimes. We support our theoretical findings with experimental results in synthetic environments demonstrating the efficiency of our method.