Optimal Learning for Dynamic Coding in Deadline-Constrained Multi-Channel Networks
This addresses a critical issue for communication systems with deadline constraints, though it appears incremental as it adapts existing UCB methods to incorporate traffic dynamics.
The paper tackles the problem of serving delay-sensitive traffic over multi-channel networks with unknown statistics by developing the UCB-Deadline policy, which achieves bounded or logarithmic regret under symmetric conditions and is shown to be order-optimal.
We study the problem of serving randomly arriving and delay-sensitive traffic over a multi-channel communication system with time-varying channel states and unknown statistics. This problem deviates from the classical exploration-exploitation setting in that the design and analysis must accommodate the dynamics of packet availability and urgency as well as the cost of each channel use at the time of decision. To that end, we have developed and investigated an index-based policy UCB-Deadline, which performs dynamic channel allocation decisions that incorporate these traffic requirements and costs. Under symmetric channel conditions, we have proved that the UCB-Deadline policy can achieve bounded regret in the likely case where the cost of using a channel is not too high to prevent all transmissions, and logarithmic regret otherwise. In this case, we show that UCB-Deadline is order-optimal. We also perform numerical investigations to validate the theoretical findings, and also compare the performance of the UCB-Deadline to another learning algorithm that we propose based on Thompson Sampling.