AI GT ROOct 21, 2024

Patrol Security Game: Defending Against Adversary with Freedom in Attack Timing, Location, and Duration

Hao-Tsung Yang, Ting-Kai Weng, Ting-Yu Chang, Kin Sum Liu, Shan Lin, Jie Gao, Shih-Yu Tsai

arXiv:2410.15600v12.31 citationsh-index: 11ACM Transactions on Cyber-Physical Systems

Originality Incremental advance

AI Analysis

This addresses security patrolling for robotics and crime prevention, but is incremental as it builds on existing game-theoretic models with new constraints.

The paper tackled the Patrol Security Game, a robotic patrolling problem where an attacker controls timing, location, and duration, by developing patrolling schedules to minimize attacker payoff. It showed that optimal solutions in zero-penalty cases involve minimizing expected hitting or return times efficiently, and proposed algorithms that outperform state-of-the-art baselines on synthetic and real-world crime datasets.

We explored the Patrol Security Game (PSG), a robotic patrolling problem modeled as an extensive-form Stackelberg game, where the attacker determines the timing, location, and duration of their attack. Our objective is to devise a patrolling schedule with an infinite time horizon that minimizes the attacker's payoff. We demonstrated that PSG can be transformed into a combinatorial minimax problem with a closed-form objective function. By constraining the defender's strategy to a time-homogeneous first-order Markov chain (i.e., the patroller's next move depends solely on their current location), we proved that the optimal solution in cases of zero penalty involves either minimizing the expected hitting time or return time, depending on the attacker model, and that these solutions can be computed efficiently. Additionally, we observed that increasing the randomness in the patrol schedule reduces the attacker's expected payoff in high-penalty cases. However, the minimax problem becomes non-convex in other scenarios. To address this, we formulated a bi-criteria optimization problem incorporating two objectives: expected maximum reward and entropy. We proposed three graph-based algorithms and one deep reinforcement learning model, designed to efficiently balance the trade-off between these two objectives. Notably, the third algorithm can identify the optimal deterministic patrol schedule, though its runtime grows exponentially with the number of patrol spots. Experimental results validate the effectiveness and scalability of our solutions, demonstrating that our approaches outperform state-of-the-art baselines on both synthetic and real-world crime datasets.

View on arXiv PDF

Similar