Collaborative Top Distribution Identifications with Limited Interaction
This work addresses the challenge of efficient top-m arm identification in reinforcement learning for scenarios where agent interaction is costly, offering incremental improvements in understanding tradeoffs.
The paper tackles the problem of identifying the top-m distributions with the largest means in a collaborative learning model with multiple agents, providing optimal tradeoffs between running time and interaction rounds and demonstrating complexity separations between different variants.
We consider the following problem in this paper: given a set of $n$ distributions, find the top-$m$ ones with the largest means. This problem is also called {\em top-$m$ arm identifications} in the literature of reinforcement learning, and has numerous applications. We study the problem in the collaborative learning model where we have multiple agents who can draw samples from the $n$ distributions in parallel. Our goal is to characterize the tradeoffs between the running time of learning process and the number of rounds of interaction between agents, which is very expensive in various scenarios. We give optimal time-round tradeoffs, as well as demonstrate complexity separations between top-$1$ arm identification and top-$m$ arm identifications for general $m$ and between fixed-time and fixed-confidence variants. As a byproduct, we also give an algorithm for selecting the distribution with the $m$-th largest mean in the collaborative learning model.