Adaptive Honeypot Engagement through Reinforcement Learning of Semi-Markov Decision Processes
This work addresses the challenge of balancing rewards and risks in active cyber defense for security practitioners, representing an incremental improvement through a novel method for a known bottleneck.
The paper tackled the problem of optimizing honeynet engagement in cybersecurity by applying Semi-Markov Decision Processes and reinforcement learning to design adaptive policies, resulting in quick attraction of attackers, long engagement for threat information, low penetration probability, and robust utility against varied attacker persistence and intelligence.
A honeynet is a promising active cyber defense mechanism. It reveals the fundamental Indicators of Compromise (IoCs) by luring attackers to conduct adversarial behaviors in a controlled and monitored environment. The active interaction at the honeynet brings a high reward but also introduces high implementation costs and risks of adversarial honeynet exploitation. In this work, we apply infinite-horizon Semi-Markov Decision Process (SMDP) to characterize a stochastic transition and sojourn time of attackers in the honeynet and quantify the reward-risk trade-off. In particular, we design adaptive long-term engagement policies shown to be risk-averse, cost-effective, and time-efficient. Numerical results have demonstrated that our adaptive engagement policies can quickly attract attackers to the target honeypot and engage them for a sufficiently long period to obtain worthy threat information. Meanwhile, the penetration probability is kept at a low level. The results show that the expected utility is robust against attackers of a large range of persistence and intelligence. Finally, we apply reinforcement learning to the SMDP to solve the curse of modeling. Under a prudent choice of the learning rate and exploration policy, we achieve a quick and robust convergence of the optimal policy and value.