CR AI HCFeb 22, 2025

Human-AI Collaboration in Cloud Security: Cognitive Hierarchy-Driven Deep Reinforcement Learning

Zahra Aref, Sheng Wei, Narayan B. Mandayam

arXiv:2502.16054v26.44 citationsh-index: 55

Originality Incremental advance

AI Analysis

This work addresses adaptive adversarial tactics in cloud security for SOC analysts, representing an incremental improvement by integrating cognitive models into existing deep reinforcement learning methods.

The paper tackles the problem of real-time threat mitigation in cloud Security Operations Centers (SOCs) by proposing a Cognitive Hierarchy Theory-driven Deep Q-Network (CHT-DQN) framework, which achieves higher data protection and lower action discrepancies compared to standard DQN in simulations and improves defense outcomes in human-in-the-loop evaluations.

Given the complexity of multi-tenant cloud environments and the growing need for real-time threat mitigation, Security Operations Centers (SOCs) must adopt AI-driven adaptive defense mechanisms to counter Advanced Persistent Threats (APTs). However, SOC analysts face challenges in handling adaptive adversarial tactics, requiring intelligent decision-support frameworks. We propose a Cognitive Hierarchy Theory-driven Deep Q-Network (CHT-DQN) framework that models interactive decision-making between SOC analysts and AI-driven APT bots. The SOC analyst (defender) operates at cognitive level-1, anticipating attacker strategies, while the APT bot (attacker) follows a level-0 policy. By incorporating CHT into DQN, our framework enhances adaptive SOC defense using Attack Graph (AG)-based reinforcement learning. Simulation experiments across varying AG complexities show that CHT-DQN consistently achieves higher data protection and lower action discrepancies compared to standard DQN. A theoretical lower bound further confirms its superiority as AG complexity increases. A human-in-the-loop (HITL) evaluation on Amazon Mechanical Turk (MTurk) reveals that SOC analysts using CHT-DQN-derived transition probabilities align more closely with adaptive attackers, leading to better defense outcomes. Moreover, human behavior aligns with Prospect Theory (PT) and Cumulative Prospect Theory (CPT): participants are less likely to reselect failed actions and more likely to persist with successful ones. This asymmetry reflects amplified loss sensitivity and biased probability weighting -- underestimating gains after failure and overestimating continued success. Our findings highlight the potential of integrating cognitive models into deep reinforcement learning to improve real-time SOC decision-making for cloud security.

View on arXiv PDF

Similar