GT LG SYFeb 29, 2024

Conjectural Online Learning with First-order Beliefs in Asymmetric Information Stochastic Games

Tao Li, Kim Hammar, Rolf Stadler, Quanyan Zhu

arXiv:2402.18781v410.817 citationsh-index: 13CDC

Originality Highly original

AI Analysis

This addresses the challenge of online adaptation in complex socio-technical systems like cyber-physical systems, offering a novel method for a known bottleneck in AISGs.

The paper tackled the problem of adapting strategies in asymmetric information stochastic games (AISGs) under generic information structures, proposing conjectural online learning (COL) which uses a forecaster-actor-critic architecture and Bayesian learning to achieve faster convergence over state-of-the-art reinforcement learning methods in nonstationary environments.

Asymmetric information stochastic games (AISGs) arise in many complex socio-technical systems, such as cyber-physical systems and IT infrastructures. Existing computational methods for AISGs are primarily offline and can not adapt to equilibrium deviations. Further, current methods are limited to particular information structures to avoid belief hierarchies. Considering these limitations, we propose conjectural online learning (COL), an online learning method under generic information structures in AISGs. COL uses a forecaster-actor-critic (FAC) architecture, where subjective forecasts are used to conjecture the opponents' strategies within a lookahead horizon, and Bayesian learning is used to calibrate the conjectures. To adapt strategies to nonstationary environments based on information feedback, COL uses online rollout with cost function approximation (actor-critic). We prove that the conjectures produced by COL are asymptotically consistent with the information feedback in the sense of a relaxed Bayesian consistency. We also prove that the empirical strategy profile induced by COL converges to the Berk-Nash equilibrium, a solution concept characterizing rationality under subjectivity. Experimental results from an intrusion response use case demonstrate COL's {faster convergence} over state-of-the-art reinforcement learning methods against nonstationary attacks.

View on arXiv PDF

Similar