Nikolai Karpov

LG
h-index1
7papers
52citations
Novelty59%
AI Score33

7 Papers

LGAug 18, 2022
Communication-Efficient Collaborative Best Arm Identification

Nikolai Karpov, Qin Zhang

We investigate top-$m$ arm identification, a basic problem in bandit theory, in a multi-agent learning model in which agents collaborate to learn an objective function. We are interested in designing collaborative learning algorithms that achieve maximum speedup (compared to single-agent learning algorithms) using minimum communication cost, as communication is frequently the bottleneck in multi-agent learning. We give both algorithmic and impossibility results, and conduct a set of experiments to demonstrate the effectiveness of our algorithms.

LGJul 16, 2022
Parallel Best Arm Identification in Heterogeneous Environments

Nikolai Karpov, Qin Zhang

In this paper, we study the tradeoffs between the time and the number of communication rounds of the best arm identification problem in the heterogeneous collaborative learning model, where multiple agents interact with possibly different environments and they want to learn in parallel an objective function in the aggregated environment. By proving almost tight upper and lower bounds, we show that collaborative learning in the heterogeneous setting is inherently more difficult than that in the homogeneous setting in terms of the time-round tradeoff.

LGJan 26, 2023
Communication-Efficient Collaborative Regret Minimization in Multi-Armed Bandits

Nikolai Karpov, Qin Zhang

In this paper, we study the collaborative learning model, which concerns the tradeoff between parallelism and communication overhead in multi-agent multi-armed bandits. For regret minimization in multi-armed bandits, we present the first set of tradeoffs between the number of rounds of communication among the agents and the regret of the collaborative learning process.

LGFeb 3, 2025
Nearly Tight Bounds for Exploration in Streaming Multi-armed Bandits with Known Optimality Gap

Nikolai Karpov, Chen Wang

We investigate the sample-memory-pass trade-offs for pure exploration in multi-pass streaming multi-armed bandits (MABs) with the *a priori* knowledge of the optimality gap $Δ_{[2]}$. Here, and throughout, the optimality gap $Δ_{[i]}$ is defined as the mean reward gap between the best and the $i$-th best arms. A recent line of results by Jin, Huang, Tang, and Xiao [ICML'21] and Assadi and Wang [COLT'24] have shown that if there is no known $Δ_{[2]}$, a pass complexity of $Θ(\log(1/Δ_{[2]}))$ (up to $\log\log(1/Δ_{[2]})$ terms) is necessary and sufficient to obtain the *worst-case optimal* sample complexity of $O(n/Δ^{2}_{[2]})$ with a single-arm memory. However, our understanding of multi-pass algorithms with known $Δ_{[2]}$ is still limited. Here, the key open problem is how many passes are required to achieve the complexity, i.e., $O( \sum_{i=2}^{n}1/Δ^2_{[i]})$ arm pulls, with a sublinear memory size. In this work, we show that the ``right answer'' for the question is $Θ(\log{n})$ passes (up to $\log\log{n}$ terms). We first present a lower bound, showing that any algorithm that finds the best arm with slightly sublinear memory -- a memory of $o({n}/{\text{polylog}({n})})$ arms -- and $O(\sum_{i=2}^{n}{1}/{Δ^{2}_{[i]}}\cdot \log{(n)})$ arm pulls has to make $Ω(\frac{\log{n}}{\log\log{n}})$ passes over the stream. We then show a nearly-matching algorithm that assuming the knowledge of $Δ_{[2]}$, finds the best arm with $O( \sum_{i=2}^{n}1/Δ^2_{[i]} \cdot \log{n})$ arm pulls and a *single arm* memory.

LGAug 15, 2021
Batched Thompson Sampling for Multi-Armed Bandits

Nikolai Karpov, Qin Zhang

We study Thompson Sampling algorithms for stochastic multi-armed bandits in the batched setting, in which we want to minimize the regret over a sequence of arm pulls using a small number of policy changes (or, batches). We propose two algorithms and demonstrate their effectiveness by experiments on both synthetic and real datasets. We also analyze the proposed algorithms from the theoretical aspect and obtain almost tight regret-batches tradeoffs for the two-arm case.

LGDec 2, 2020
Instance-Sensitive Algorithms for Pure Exploration in Multinomial Logit Bandit

Nikolai Karpov, Qin Zhang

Motivated by real-world applications such as fast fashion retailing and online advertising, the Multinomial Logit Bandit (MNL-bandit) is a popular model in online learning and operations research, and has attracted much attention in the past decade. However, it is a bit surprising that pure exploration, a basic problem in bandit theory, has not been well studied in MNL-bandit so far. In this paper we give efficient algorithms for pure exploration in MNL-bandit. Our algorithms achieve instance-sensitive pull complexities. We also complement the upper bounds by an almost matching lower bound.

DSApr 20, 2020
Collaborative Top Distribution Identifications with Limited Interaction

Nikolai Karpov, Qin Zhang, Yuan Zhou

We consider the following problem in this paper: given a set of $n$ distributions, find the top-$m$ ones with the largest means. This problem is also called {\em top-$m$ arm identifications} in the literature of reinforcement learning, and has numerous applications. We study the problem in the collaborative learning model where we have multiple agents who can draw samples from the $n$ distributions in parallel. Our goal is to characterize the tradeoffs between the running time of learning process and the number of rounds of interaction between agents, which is very expensive in various scenarios. We give optimal time-round tradeoffs, as well as demonstrate complexity separations between top-$1$ arm identification and top-$m$ arm identifications for general $m$ and between fixed-time and fixed-confidence variants. As a byproduct, we also give an algorithm for selecting the distribution with the $m$-th largest mean in the collaborative learning model.