Online Learning with Randomized Feedback Graphs for Optimal PUE Attacks in Cognitive Radio Networks
This addresses security vulnerabilities in cognitive radio networks for attackers, though it is incremental as it applies existing online learning methods to a new attack scenario.
The paper tackles the problem of designing optimal primary user emulation (PUE) attack strategies in cognitive radio networks by formulating it as an online learning problem, showing that with at least one channel observation, the attacker achieves significant performance improvements and optimal regret bounds.
In a cognitive radio network, a secondary user learns the spectrum environment and dynamically accesses the channel where the primary user is inactive. At the same time, a primary user emulation (PUE) attacker can send falsified primary user signals and prevent the secondary user from utilizing the available channel. The best attacking strategies that an attacker can apply have not been well studied. In this paper, for the first time, we study optimal PUE attack strategies by formulating an online learning problem where the attacker needs to dynamically decide the attacking channel in each time slot based on its attacking experience. The challenge in our problem is that since the PUE attack happens in the spectrum sensing phase, the attacker cannot observe the reward on the attacked channel. To address this challenge, we utilize the attacker's observation capability. We propose online learning-based attacking strategies based on the attacker's observation capabilities. Through our analysis, we show that with no observation within the attacking slot, the attacker loses on the regret order, and with the observation of at least one channel, there is a significant improvement on the attacking performance. Observation of multiple channels does not give additional benefit to the attacker (only a constant scaling) though it gives insight on the number of observations required to achieve the minimum constant factor. Our proposed algorithms are optimal in the sense that their regret upper bounds match their corresponding regret lower-bounds. We show consistency between simulation and analytical results under various system parameters.