MA LGSep 25, 2019

$α^α$-Rank: Practically Scaling $α$-Rank through Stochastic Optimisation

Yaodong Yang, Rasul Tutunov, Phu Sakulwongtana, Haitham Bou Ammar

arXiv:1909.11628v65.121 citationsh-index: 32

Originality Incremental advance

AI Analysis

This addresses a critical scalability problem for researchers and practitioners in multi-agent systems, allowing large-scale evaluations previously infeasible, though it is incremental as it builds directly on α-Rank.

The paper tackles the scalability issue of α-Rank, a graph-based algorithm for ranking joint policy profiles in multi-agent systems, by proposing α^α-Rank, a stochastic implementation with a double oracle mechanism that reduces joint strategy spaces, achieving a 1000x speed reduction on 10^4 × 10^4 random matrices and enabling evaluation on settings with up to 33 million joint strategies.

Recently, $α$-Rank, a graph-based algorithm, has been proposed as a solution to ranking joint policy profiles in large scale multi-agent systems. $α$-Rank claimed tractability through a polynomial time implementation with respect to the total number of pure strategy profiles. Here, we note that inputs to the algorithm were not clearly specified in the original presentation; as such, we deem complexity claims as not grounded, and conjecture solving $α$-Rank is NP-hard. The authors of $α$-Rank suggested that the input to $α$-Rank can be an exponentially-sized payoff matrix; a claim promised to be clarified in subsequent manuscripts. Even though $α$-Rank exhibits a polynomial-time solution with respect to such an input, we further reflect additional critical problems. We demonstrate that due to the need of constructing an exponentially large Markov chain, $α$-Rank is infeasible beyond a small finite number of agents. We ground these claims by adopting amount of dollars spent as a non-refutable evaluation metric. Realising such scalability issue, we present a stochastic implementation of $α$-Rank with a double oracle mechanism allowing for reductions in joint strategy spaces. Our method, $α^α$-Rank, does not need to save exponentially-large transition matrix, and can terminate early under required precision. Although theoretically our method exhibits similar worst-case complexity guarantees compared to $α$-Rank, it allows us, for the first time, to practically conduct large-scale multi-agent evaluations. On $10^4 \times 10^4$ random matrices, we achieve $1000x$ speed reduction. Furthermore, we also show successful results on large joint strategy profiles with a maximum size in the order of $\mathcal{O}(2^{25})$ ($\approx 33$ million joint strategies) -- a setting not evaluable using $α$-Rank with reasonable computational budget.

View on arXiv PDF

Similar