LGCRFeb 9

SoK: The Pitfalls of Deep Reinforcement Learning for Cybersecurity

arXiv:2602.08690v12 citationsh-index: 8
Originality Incremental advance
AI Analysis

It addresses the problem of unreliable DRL applications in cybersecurity for researchers and practitioners, highlighting widespread issues in the literature.

The paper identifies and systematizes 11 methodological pitfalls in applying Deep Reinforcement Learning (DRL) to cybersecurity, finding an average of over five pitfalls per paper across 66 analyzed studies, and provides actionable recommendations to improve rigor and deployability.

Deep Reinforcement Learning (DRL) has achieved remarkable success in domains requiring sequential decision-making, motivating its application to cybersecurity problems. However, transitioning DRL from laboratory simulations to bespoke cyber environments can introduce numerous issues. This is further exacerbated by the often adversarial, non-stationary, and partially-observable nature of most cybersecurity tasks. In this paper, we identify and systematize 11 methodological pitfalls that frequently occur in DRL for cybersecurity (DRL4Sec) literature across the stages of environment modeling, agent training, performance evaluation, and system deployment. By analyzing 66 significant DRL4Sec papers (2018-2025), we quantify the prevalence of each pitfall and find an average of over five pitfalls per paper. We demonstrate the practical impact of these pitfalls using controlled experiments in (i) autonomous cyber defense, (ii) adversarial malware creation, and (iii) web security testing environments. Finally, we provide actionable recommendations for each pitfall to support the development of more rigorous and deployable DRL-based security systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes