Anjali Purathekandy

1paper

1 Paper

2.1CEMar 12
Online Learning of Strategic Defense against Ecological Adversaries under Partial Observability with Semi-Bandit Feedback

Anjali Purathekandy, Deepak N. Subramani

We introduce an online learning algorithm for computing adaptive resource allocation policies against strategic ecological adversaries with unknown behavioral models and partial observability. Our setting addresses a fundamental limitation of security games: when adversary behavior cannot be modeled a priori, classical equilibrium-based approaches fail. We formulate the problem as regret minimization in a combinatorial action space with semi-bandit feedback, where payoffs are non-stationary and interdependent across targets. Our algorithm, named HERDS (Human-Elephant conflict mitigation through Resource Deployment for Strategic guarding), extends Follow-the-Perturbed-Leader (FPL) with three innovations: (1) simultaneous exploration-exploitation through dynamic budget partitioning driven by observed losses, (2) adaptive payoff estimation under confounded observations where attack entry points are unidentifiable, and (3) model-agnostic learning that provides regret guarantees without behavioral assumptions. We demonstrate our framework on Human-Elephant Conflict mitigation, a domain where intelligent ecological adversaries exhibit strategic behavior (optimal foraging, spatial memory, adaptive evasion) yet lack tractable behavioral models. Experiments using an Agent-Based Model calibrated with elephant movement data demonstrate 15--45% regret reduction versus Follow-the-Perturbed-Leader with Uniform-Exploration (FPL-UE), 40--50% crop damage reduction against adaptive adversaries, and convergence in 40--50 rounds versus 60--80 for baselines.