Multi-user lax communications: a multi-armed bandit approach
This addresses efficient channel allocation in distributed networks like cognitive radio, but it is incremental as it builds on existing multi-armed bandit approaches.
The paper tackles the problem of multiple users learning to share communication channels without collisions in a distributed setting, modeled as a multi-user multi-armed bandit problem, and develops an algorithm with convergence guarantees and experimental validation against state-of-the-art methods.
Inspired by cognitive radio networks, we consider a setting where multiple users share several channels modeled as a multi-user multi-armed bandit (MAB) problem. The characteristics of each channel are unknown and are different for each user. Each user can choose between the channels, but her success depends on the particular channel chosen as well as on the selections of other users: if two users select the same channel their messages collide and none of them manages to send any data. Our setting is fully distributed, so there is no central control. As in many communication systems, the users cannot set up a direct communication protocol, so information exchange must be limited to a minimum. We develop an algorithm for learning a stable configuration for the multi-user MAB problem. We further offer both convergence guarantees and experiments inspired by real communication networks, including comparison to state-of-the-art algorithms.