Learning Optimal Defender Strategies for CAGE-2 using a POMDP Model
This work addresses cybersecurity defense for IT infrastructure, presenting an incremental improvement over existing methods.
The paper tackled the problem of learning optimal defender strategies for the CAGE-2 cybersecurity benchmark by constructing a POMDP model and introducing the BF-PPO method, which outperformed the leading CARDIFF method in strategy quality and training time.
CAGE-2 is an accepted benchmark for learning and evaluating defender strategies against cyberattacks. It reflects a scenario where a defender agent protects an IT infrastructure against various attacks. Many defender methods for CAGE-2 have been proposed in the literature. In this paper, we construct a formal model for CAGE-2 using the framework of Partially Observable Markov Decision Process (POMDP). Based on this model, we define an optimal defender strategy for CAGE-2 and introduce a method to efficiently learn this strategy. Our method, called BF-PPO, is based on PPO, and it uses particle filter to mitigate the computational complexity due to the large state space of the CAGE-2 model. We evaluate our method in the CAGE-2 CybORG environment and compare its performance with that of CARDIFF, the highest ranked method on the CAGE-2 leaderboard. We find that our method outperforms CARDIFF regarding the learned defender strategy and the required training time.