Zongying Shi

SYNov 18, 2019

Guarding a Subspace in High-Dimensional Space with Two Defenders and One Attacker

Rui Yan, Zongying Shi, Yisheng Zhong

This paper considers a subspace guarding game in high-dimensional space which consists of a play subspace and a target subspace. Two faster defenders cooperate to protect the target subspace by capturing an attacker which strives to enter the target subspace from the play subspace without being captured. A closed-form solution is provided from the perspectives of kind and degree. Contributions of the work include the use of the attack subspace (AS) method to construct the barrier, by which the game winner can be perfectly predicted before the game starts. In addition to this inclusion, with the priori information about the game result, a critical payoff function is designed when the defenders can win the game. Then, the optimal strategy for each player is explicitly reformulated as a saddle-point equilibrium. Finally, we apply these theoretical results to a half-space guarding game in three-dimensional space. Since the whole achieved developments are analytical, they require a little memory without computational burden and allow for real-time updates, beyond the capacity of traditional Hamilton-Jacobi-Isaacs method. It is worth noting that this is the first time in the current work to consider the target guarding games for arbitrary high-dimensional space, and in a fully analytical form.

GTJun 17, 2020

Policy Evaluation and Seeking for Multi-Agent Reinforcement Learning via Best Response

Rui Yan, Xiaoming Duan, Zongying Shi et al.

This paper introduces two metrics (cycle-based and memory-based metrics), grounded on a dynamical game-theoretic solution concept called sink equilibrium, for the evaluation, ranking, and computation of policies in multi-agent learning. We adopt strict best response dynamics (SBRD) to model selfish behaviors at a meta-level for multi-agent reinforcement learning. Our approach can deal with dynamical cyclical behaviors (unlike approaches based on Nash equilibria and Elo ratings), and is more compatible with single-agent reinforcement learning than alpha-rank which relies on weakly better responses. We first consider settings where the difference between largest and second largest underlying metric has a known lower bound. With this knowledge we propose a class of perturbed SBRD with the following property: only policies with maximum metric are observed with nonzero probability for a broad class of stochastic games with finite memory. We then consider settings where the lower bound for the difference is unknown. For this setting, we propose a class of perturbed SBRD such that the metrics of the policies observed with nonzero probability differ from the optimal by any given tolerance. The proposed perturbed SBRD addresses the opponent-induced non-stationarity by fixing the strategies of others for the learning agent, and uses empirical game-theoretic analysis to estimate payoffs for each strategy profile obtained due to the perturbation.

Zongying Shi

2 Papers