LGAISep 8, 2025

Learning Optimal Defender Strategies for CAGE-2 using a POMDP Model

arXiv:2509.06539v11 citationsh-index: 4CNSM
Originality Incremental advance
AI Analysis

This work addresses cybersecurity defense for IT infrastructure, presenting an incremental improvement over existing methods.

The paper tackled the problem of learning optimal defender strategies for the CAGE-2 cybersecurity benchmark by constructing a POMDP model and introducing the BF-PPO method, which outperformed the leading CARDIFF method in strategy quality and training time.

CAGE-2 is an accepted benchmark for learning and evaluating defender strategies against cyberattacks. It reflects a scenario where a defender agent protects an IT infrastructure against various attacks. Many defender methods for CAGE-2 have been proposed in the literature. In this paper, we construct a formal model for CAGE-2 using the framework of Partially Observable Markov Decision Process (POMDP). Based on this model, we define an optimal defender strategy for CAGE-2 and introduce a method to efficiently learn this strategy. Our method, called BF-PPO, is based on PPO, and it uses particle filter to mitigate the computational complexity due to the large state space of the CAGE-2 model. We evaluate our method in the CAGE-2 CybORG environment and compare its performance with that of CARDIFF, the highest ranked method on the CAGE-2 leaderboard. We find that our method outperforms CARDIFF regarding the learned defender strategy and the required training time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes