Constrained Thompson Sampling for Wireless Link Optimization
This addresses the challenge of optimizing wireless link performance for communication systems, but it is incremental as it adapts an existing method to a specific constraint.
The paper tackles the problem of selecting optimal data transmission rates in wireless links to maximize throughput under latency constraints by modeling it as a stochastic multi-armed bandit problem and proposing Con-TS, a constrained Thompson sampling algorithm, which achieves an expected number of constraint violations bounded by O(√KT) and expected loss in cumulative throughput bounded by O(√KT log K).
Wireless communication systems operate in complex time-varying environments. Therefore, selecting the optimal configuration parameters in these systems is a challenging problem. For wireless links, \emph{rate selection} is used to select the optimal data transmission rate that maximizes the link throughput subject to an application-defined latency constraint. We model rate selection as a stochastic multi-armed bandit (MAB) problem, where a finite set of transmission rates are modeled as independent bandit arms. For this setup, we propose Con-TS, a novel constrained version of the Thompson sampling algorithm, where the latency requirement is modeled by a high-probability linear constraint. We show that for Con-TS, the expected number of constraint violations over T transmission intervals is upper bounded by O(\sqrt{KT}), where K is the number of available rates. Further, the expected loss in cumulative throughput compared to the optimal rate selection scheme (i.e., the egret is also upper bounded by O(\sqrt{KT \log K}). Through numerical simulations, we demonstrate that Con-TS significantly outperforms state-of-the-art bandit schemes for rate selection.