Constrained Contextual Bandit Learning for Adaptive Radar Waveform Selection
This addresses the challenge of efficient and robust waveform selection for adaptive radar systems in dynamic environments, representing an incremental improvement by applying existing bandit algorithms with novel constraints.
The paper tackled the problem of adaptive radar waveform selection by formulating it as a linear contextual bandit, using spectrum observations and receiver feedback to improve target detection. Simulations showed substantial performance gains in radar-communication coexistence and adversarial jamming scenarios, with mitigation of harmful pulse-agile effects through time-varying constraints.
A sequential decision process in which an adaptive radar system repeatedly interacts with a finite-state target channel is studied. The radar is capable of passively sensing the spectrum at regular intervals, which provides side information for the waveform selection process. The radar transmitter uses the sequence of spectrum observations as well as feedback from a collocated receiver to select waveforms which accurately estimate target parameters. It is shown that the waveform selection problem can be effectively addressed using a linear contextual bandit formulation in a manner that is both computationally feasible and sample efficient. Stochastic and adversarial linear contextual bandit models are introduced, allowing the radar to achieve effective performance in broad classes of physical environments. Simulations in a radar-communication coexistence scenario, as well as in an adversarial radar-jammer scenario, demonstrate that the proposed formulation provides a substantial improvement in target detection performance when Thompson Sampling and EXP3 algorithms are used to drive the waveform selection process. Further, it is shown that the harmful impacts of pulse-agile behavior on coherently processed radar data can be mitigated by adopting a time-varying constraint on the radar's waveform catalog.