CEMar 12

Online Learning of Strategic Defense against Ecological Adversaries under Partial Observability with Semi-Bandit Feedback

Anjali Purathekandy, Deepak N. Subramani

arXiv:2603.11726v12.1h-index: 1

Predicted impact top 100% in CE · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses a fundamental limitation in security games for domains like human-elephant conflict, where adversary behavior cannot be modeled a priori, offering incremental improvements over existing methods.

The paper tackles the problem of adaptive resource allocation against strategic ecological adversaries with unknown behavioral models and partial observability, introducing the HERDS algorithm that reduces regret by 15-45% and crop damage by 40-50% compared to baselines.

We introduce an online learning algorithm for computing adaptive resource allocation policies against strategic ecological adversaries with unknown behavioral models and partial observability. Our setting addresses a fundamental limitation of security games: when adversary behavior cannot be modeled a priori, classical equilibrium-based approaches fail. We formulate the problem as regret minimization in a combinatorial action space with semi-bandit feedback, where payoffs are non-stationary and interdependent across targets. Our algorithm, named HERDS (Human-Elephant conflict mitigation through Resource Deployment for Strategic guarding), extends Follow-the-Perturbed-Leader (FPL) with three innovations: (1) simultaneous exploration-exploitation through dynamic budget partitioning driven by observed losses, (2) adaptive payoff estimation under confounded observations where attack entry points are unidentifiable, and (3) model-agnostic learning that provides regret guarantees without behavioral assumptions. We demonstrate our framework on Human-Elephant Conflict mitigation, a domain where intelligent ecological adversaries exhibit strategic behavior (optimal foraging, spatial memory, adaptive evasion) yet lack tractable behavioral models. Experiments using an Agent-Based Model calibrated with elephant movement data demonstrate 15--45% regret reduction versus Follow-the-Perturbed-Leader with Uniform-Exploration (FPL-UE), 40--50% crop damage reduction against adaptive adversaries, and convergence in 40--50 rounds versus 60--80 for baselines.

View on arXiv PDF

Similar