77.3THMay 5
The Adversarial Discount - AI, Signal Correlation, and the Cybersecurity Arms RaceJames W. Bono
We study a contest-theoretic model of adversarial investment in which an attacker and a defender allocate resources to AI-augmented capabilities across multiple attack surfaces. The attacker's investment operates through two channels: it amplifies offensive potency unconditionally and erodes defensive effectiveness conditionally, generating an adversarial discount that deepens endogenously with the defender's own investment. We derive a closed-form arms race ratio decomposing the relative marginal effectiveness of offensive and defensive investment into six structural primitives and establish equilibrium uniqueness and global convergence under a continuous best-response dynamic. The central result concerns signal cross-correlation, the degree to which threat intelligence on one surface informs detection on another. With full cross-correlation, the arms race ratio is independent of the number of attack surfaces: the attacker's structural advantage from surface proliferation is completely neutralized. Under the benchmark full-dilution case, without cross-correlation, per-surface defense effectiveness vanishes as the attack surface grows. Extending the analysis to heterogeneous defenders facing an attacker who targets by expected value, we argue that the model points to a dual inefficiency: overinvestment in private defense (a zero-sum redirective externality) and underinvestment in shared signal correlation (a public good). These formal results, together with public-good reasoning outside the base model, characterize when collective information aggregation can dominate private capability investment as the decisive margin in adversarial contests.
AISep 19, 2017
Deep Reinforcement Learning for Event-Driven Multi-Agent Decision ProcessesKunal Menda, Yi-Chun Chen, Justin Grana et al.
The incorporation of macro-actions (temporally extended actions) into multi-agent decision problems has the potential to address the curse of dimensionality associated with such decision problems. Since macro-actions last for stochastic durations, multiple agents executing decentralized policies in cooperative environments must act asynchronously. We present an algorithm that modifies generalized advantage estimation for temporally extended actions, allowing a state-of-the-art policy optimization algorithm to optimize policies in Dec-POMDPs in which agents act asynchronously. We show that our algorithm is capable of learning optimal policies in two cooperative domains, one involving real-time bus holding control and one involving wildfire fighting with unmanned aircraft. Our algorithm works by framing problems as "event-driven decision processes," which are scenarios in which the sequence and timing of actions and events are random and governed by an underlying stochastic process. In addition to optimizing policies with continuous state and action spaces, our algorithm also facilitates the use of event-driven simulators, which do not require time to be discretized into time-steps. We demonstrate the benefit of using event-driven simulation in the context of multiple agents taking asynchronous actions. We show that fixed time-step simulation risks obfuscating the sequence in which closely separated events occur, adversely affecting the policies learned. In addition, we show that arbitrarily shrinking the time-step scales poorly with the number of agents.