Adversarial Online Learning with Variable Plays in the Pursuit-Evasion Game: Theoretical Foundations and Application in Connected and Automated Vehicle Cybersecurity
This work addresses dynamic cybersecurity resource allocation in transportation systems, offering incremental improvements to adversarial bandit methods for specific domain applications.
The paper tackles the problem of variable resource allocation in cybersecurity for connected vehicles by extending adversarial multi-armed bandits to allow a variable number of arms, modeling it as a pursuit-evasion game and deriving conditions for Nash equilibrium. It provides an algorithm with sublinear pseudo-regret for the defender and bounds on attacker rewards, supported by numerical experiments showing effectiveness.
We extend the adversarial/non-stochastic multi-play multi-armed bandit (MPMAB) to the case where the number of arms to play is variable. The work is motivated by the fact that the resources allocated to scan different critical locations in an interconnected transportation system change dynamically over time and depending on the environment. By modeling the malicious hacker and the intrusion monitoring system as the attacker and the defender, respectively, we formulate the problem for the two players as a sequential pursuit-evasion game. We derive the condition under which a Nash equilibrium of the strategic game exists. For the defender side, we provide an exponential-weighted based algorithm with sublinear pseudo-regret. We further extend our model to heterogeneous rewards for both players, and obtain lower and upper bounds on the average reward for the attacker. We provide numerical experiments to demonstrate the effectiveness of a variable-arm play.