Michael Dorothy

MA
5papers
50citations
Novelty39%
AI Score45

5 Papers

MAMar 24
Dynamic Adversarial Resource Allocation: the dDAB Game

Yue Guan, Daigo Shishika, Jason R. Marden et al. · gatech

This work introduces the dynamic Defender-Attacker Blotto (dDAB) game, extending the classical static Blotto game to a dynamic resource allocation setting over graphs. In the dDAB game, a defender is required to maintain numerical superiority against attacker resources across a set of key nodes in a connected graph. The engagement unfolds as a discrete-time game, where each player reallocates its resources in turn, with resources allowed to move at most one hop per time step. The primary goal is to determine the necessary and sufficient amount of defender resources required to guarantee sustained defense, along with the corresponding strategies. To address the central challenge arising from graph-constrained resource reallocation, we conduct a reachability analysis, starting with simplified settings where attacker resources act as a single cohesive group. We then extend the framework to allow attacker resources to split and merge arbitrarily, and construct defender strategies using superposition principles. A set-based dynamic programming algorithm is developed to compute the optimal strategies, as well as the minimum amount of defender resources to ensure successful defense. The effectiveness of our approach is demonstrated through numerical simulations and hardware experiments on the Georgia Tech Robotarium platform.

SYMar 27
Optimal Hiding with Partial Information of the Seeker's Route

Prajakta Surve, Shaunak D. Bopardikar, Daigo Shishika et al.

We consider a hide-and-seek game between a Hider and a Seeker over a finite set of locations. The Hider chooses one location to conceal a stationary treasure, while the Seeker visits the locations sequentially along a route. As the search progresses, the Hider observes a prefix of the Seeker's route. After observing this information, the Hider has the option to relocate the treasure at most once to another unvisited location by paying a switching cost. We study two seeker models. In the first, the Seeker is unaware of the fact that the Hider can relocate. In the second, the Seeker select its route while accounting for the possibility that the Hider observes its path and reallocates. For the restricted case, we define the value-of-information created by the reveal and derive upper bounds in terms of the switching cost using a worst-case evaluation over routes. We also show that seeker awareness reduces the game value, with the difference between the restricted and feedback models bounded by the entry-wise gap between the corresponding payoff matrices. Numerical examples show how this benefit decreases as the switching cost increases and as the reveal occurs later along the route.

LGMar 16
Game-Theory-Assisted Reinforcement Learning for Border Defense: Early Termination based on Analytical Solutions

Goutam Das, Michael Dorothy, Kyle Volle et al.

Game theory provides the gold standard for analyzing adversarial engagements, offering strong optimality guarantees. However, these guarantees often become brittle when assumptions such as perfect information are violated. Reinforcement learning (RL), by contrast, is adaptive but can be sample-inefficient in large, complex domains. This paper introduces a hybrid approach that leverages game-theoretic insights to improve RL training efficiency. We study a border defense game with limited perceptual range, where defender performance depends on both search and pursuit strategies, making classical differential game solutions inapplicable. Our method employs the Apollonius Circle (AC) to compute equilibrium in the post-detection phase, enabling early termination of RL episodes without learning pursuit dynamics. This allows RL to concentrate on learning search strategies while guaranteeing optimal continuation after detection. Across single- and multi-defender settings, this early termination method yields 10-20% higher rewards, faster convergence, and more efficient search trajectories. Extensive experiments validate these findings and demonstrate the overall effectiveness of our approach.

SYFeb 3
C-IDS: Solving Contextual POMDP via Information-Directed Objective

Chongyang Shi, Michael Dorothy, Jie Fu

We study the policy synthesis problem in contextual partially observable Markov decision processes (CPOMDPs), where the environment is governed by an unknown latent context that induces distinct POMDP dynamics. Our goal is to design a policy that simultaneously maximizes cumulative return and actively reduces uncertainty about the underlying context. We introduce an information-directed objective that augments reward maximization with mutual information between the latent context and the agent's observations. We develop the C-IDS algorithm to synthesize policies that maximize the information-directed objective. We show that the objective can be interpreted as a Lagrangian relaxation of the linear information ratio and prove that the temperature parameter is an upper bound on the information ratio. Based on this characterization, we establish a sublinear Bayesian regret bound over K episodes. We evaluate our approach on a continuous Light-Dark environment and show that it consistently outperforms standard POMDP solvers that treat the unknown context as a latent state variable, achieving faster context identification and higher returns.

MAJul 29, 2021
Survey of Recent Multi-Agent Reinforcement Learning Algorithms Utilizing Centralized Training

Piyush K. Sharma, Rolando Fernandez, Erin Zaroukian et al.

Much work has been dedicated to the exploration of Multi-Agent Reinforcement Learning (MARL) paradigms implementing a centralized learning with decentralized execution (CLDE) approach to achieve human-like collaboration in cooperative tasks. Here, we discuss variations of centralized training and describe a recent survey of algorithmic approaches. The goal is to explore how different implementations of information sharing mechanism in centralized learning may give rise to distinct group coordinated behaviors in multi-agent systems performing cooperative tasks.