Decentralized Learning for Channel Allocation in IoT Networks over Unlicensed Bandwidth as a Contextual Multi-player Multi-armed Bandit Game
This addresses efficient, collision-free spectrum sharing for IoT devices in ad-hoc networks, though it is incremental as it builds on existing bandit algorithms for a specific domain.
The paper tackles decentralized channel allocation in IoT networks with limited sensing and computational resources by mapping it to a contextual multi-player multi-armed bandit game, proposing a three-stage policy learning algorithm that achieves social optimal allocation with polylogarithmic regret and balances efficiency and scalability in simulations.
We study a decentralized channel allocation problem in an ad-hoc Internet of Things network underlaying on the spectrum licensed to a primary cellular network. In the considered network, the impoverished channel sensing/probing capability and computational resource on the IoT devices make them difficult to acquire the detailed Channel State Information (CSI) for the shared multiple channels. In practice, the unknown patterns of the primary users' transmission activities and the time-varying CSI (e.g., due to small-scale fading or device mobility) also cause stochastic changes in the channel quality. Decentralized IoT links are thus expected to learn channel conditions online based on partial observations, while acquiring no information about the channels that they are not operating on. They also have to reach an efficient, collision-free solution of channel allocation with limited coordination. Our study maps this problem into a contextual multi-player, multi-armed bandit game, and proposes a purely decentralized, three-stage policy learning algorithm through trial-and-error. Theoretical analyses shows that the proposed scheme guarantees the IoT links to jointly converge to the social optimal channel allocation with a sub-linear (i.e., polylogarithmic) regret with respect to the operational time. Simulations demonstrate that it strikes a good balance between efficiency and network scalability when compared with the other state-of-the-art decentralized bandit algorithms.