Efficient Online Learning for Cognitive Radar-Cellular Coexistence via Contextual Thompson Sampling
This addresses spectrum efficiency for radar-cellular coexistence, but it is incremental as it builds on existing contextual bandit methods.
The paper tackles the problem of adaptive radar transmissions for spectrum sharing with a non-cooperative cellular network by applying a linear Contextual Bandit framework with Thompson Sampling, resulting in faster convergence to minimize interference and maximize utilization compared to other algorithms, with competitive performance against a Deep Q-Network.
This paper describes a sequential, or online, learning scheme for adaptive radar transmissions that facilitate spectrum sharing with a non-cooperative cellular network. First, the interference channel between the radar and a spatially distant cellular network is modeled. Then, a linear Contextual Bandit (CB) learning framework is applied to drive the radar's behavior. The fundamental trade-off between exploration and exploitation is balanced by a proposed Thompson Sampling (TS) algorithm, a pseudo-Bayesian approach which selects waveform parameters based on the posterior probability that a specific waveform is optimal, given discounted channel information as context. It is shown that the contextual TS approach converges more rapidly to behavior that minimizes mutual interference and maximizes spectrum utilization than comparable contextual bandit algorithms. Additionally, we show that the TS learning scheme results in a favorable SINR distribution compared to other online learning algorithms. Finally, the proposed TS algorithm is compared to a deep reinforcement learning model. We show that the TS algorithm maintains competitive performance with a more complex Deep Q-Network (DQN).