OCMar 18, 2018
A systematic process for evaluating structured perfect Bayesian equilibria in dynamic games with asymmetric informationDeepanshu Vasal, Abhinav Sinha, Achilleas Anastasopoulos
We consider finite-horizon and infinite-horizon versions of a dynamic game with $N$ selfish players who observe their types privately and take actions that are publicly observed. Players' types evolve as conditionally independent Markov processes, conditioned on their current actions. Their actions and types jointly determine their instantaneous rewards. In dynamic games with asymmetric information, a widely used concept of equilibrium is perfect Bayesian equilibrium (PBE), which consists of a strategy and belief pair that simultaneously satisfy sequential rationality and belief consistency. In general, there does not exist a universal algorithm that decouples the interdependence of strategies and beliefs over time in calculating PBE. In this paper, for the finite-horizon game with independent types we develop a two-step backward-forward recursive algorithm that sequentially decomposes the problem (w.r.t. time) to obtain a subset of PBEs, which we refer to as structured Bayesian perfect equilibria (SPBE). In such equilibria, a player's strategy depends on its history only through a common public belief and its current private type. The backward recursive part of this algorithm defines an equilibrium generating function. Each period in the backward recursion involves solving a fixed-point equation on the space of probability simplexes for every possible belief on types. Using this function, equilibrium strategies and beliefs are generated through a forward recursion. We then extend this methodology to the infinite-horizon model, where we propose a time-invariant single-shot fixed-point equation, which in conjunction with a forward recursive step, generates the SPBE. Sufficient conditions for the existence of SPBE are provided. With our proposed method, we find equilibria that exhibit signaling behavior. This is illustrated with the help of a concrete public goods example.
GTJun 15, 2016
Signaling equilibria for dynamic LQG games with asymmetric informationDeepanshu Vasal, Achilleas Anastasopoulos
We consider a finite horizon dynamic game with two players who observe their types privately and take actions, which are publicly observed. Players' types evolve as independent, controlled linear Gaussian processes and players incur quadratic instantaneous costs. This forms a dynamic linear quadratic Gaussian (LQG) game with asymmetric information. We show that under certain conditions, players' strategies that are linear in their private types, together with Gaussian beliefs form a perfect Bayesian equilibrium (PBE) of the game. Furthermore, it is shown that this is a signaling equilibrium due to the fact that future beliefs on players' types are affected by the equilibrium strategies. We provide a backward-forward algorithm to find the PBE. Each step of the backward algorithm reduces to solving an algebraic matrix equation for every possible realization of the state estimate covariance matrix. The forward algorithm consists of Kalman filter recursions, where state estimate covariance matrices depend on equilibrium strategies.
GTApr 9, 2018
Decentralized Bayesian learning in dynamic games: A framework for studying informational cascadesDeepanshu Vasal, Achilleas Anastasopoulos
We study the problem of Bayesian learning in a dynamical system involving strategic agents with asymmetric information. In a series of seminal papers in the literature, this problem has been investigated under a simplifying model where myopically selfish players appear sequentially and act once in the game, based on private noisy observations of the system state and public observation of past players' actions. It has been shown that there exist information cascades where users discard their private information and mimic the action of their predecessor. In this paper, we provide a framework for studying Bayesian learning dynamics in a more general setting than the one described above. In particular, our model incorporates cases where players are non-myopic and strategically participate for the whole duration of the game, and cases where an endogenous process selects which subset of players will act at each time instance. The proposed framework hinges on a sequential decomposition methodology for finding structured perfect Bayesian equilibria (PBE) of a general class of dynamic games with asymmetric information, where user-specific states evolve as conditionally independent Markov processes and users make independent noisy observations of their states. Using this methodology, we study a specific dynamic learning model where players make decisions about public investment based on their estimates of everyone's types. We characterize a set of informational cascades for this problem where learning stops for the team as a whole. We show that in such cascades, all players' estimates of other players' types freeze even though each individual player asymptotically learns its own true type.
GTApr 13, 2018
Incentive design for learning in user-recommendation systems with time-varying statesDeepanshu Vasal, Vijay Subramanian, Achilleas Anastasopoulos
We consider the problem of how strategic users with asymmetric information can learn an underlying time varying state in a user-recommendation system. Users who observe private signals about the state, sequentially make a decision about buying a product whose value varies with time in an ergodic manner. We formulate the team problem as an instance of decentralized stochastic control problem and characterize its optimal policies. With strategic users, we design incentives such that users reveal their true private signals, so that the gap between the strategic and team objective is small and the overall expected incentive payments are also small.
GTOct 21, 2019
Markov perfect equilibria in non-stationary mean-field gamesDeepanshu Vasal
In this paper, we consider both finite and infinite horizon discounted dynamic mean-field games where there is a large population of homogeneous players sequentially making strategic decisions and each player is affected by other players through an aggregate population state. Each player has a private type that only she observes. Such games have been studied in the literature under simplifying assumption that population state dynamics are stationary. In this paper, we consider non-stationary population state dynamics and present a novel backward recursive algorithm to compute Markov perfect equilibrium (MPE) that depend on both, a player's private type, and current (dynamic) population state. Using this algorithm, we study a security problem in cyberphysical system where infected nodes put negative externality on the system, and each node makes a decision to get vaccinated. We numerically compute MPE of the game.
LGDec 11, 2021
Convergence of Generalized Belief Propagation Algorithm on Graphs with MotifsYitao Chen, Deepanshu Vasal
Belief propagation is a fundamental message-passing algorithm for numerous applications in machine learning. It is known that belief propagation algorithm is exact on tree graphs. However, belief propagation is run on loopy graphs in most applications. So, understanding the behavior of belief propagation on loopy graphs has been a major topic for researchers in different areas. In this paper, we study the convergence behavior of generalized belief propagation algorithm on graphs with motifs (triangles, loops, etc.) We show under a certain initialization, generalized belief propagation converges to the global optimum of the Bethe free energy for ferromagnetic Ising models on graphs with motifs.
AINov 6, 2020
Multi-Agent Decentralized Belief Propagation on GraphsYitao Chen, Deepanshu Vasal
We consider the problem of interactive partially observable Markov decision processes (I-POMDPs), where the agents are located at the nodes of a communication network. Specifically, we assume a certain message type for all messages. Moreover, each agent makes individual decisions based on the interactive belief states, the information observed locally and the messages received from its neighbors over the network. Within this setting, the collective goal of the agents is to maximize the globally averaged return over the network through exchanging information with their neighbors. We propose a decentralized belief propagation algorithm for the problem, and prove the convergence of our algorithm. Finally we show multiple applications of our framework. Our work appears to be the first study of decentralized belief propagation algorithm for networked multi-agent I-POMDPs.
OCMay 24, 2020
Model-free Reinforcement Learning for Stochastic Stackelberg Security GamesRajesh K Mishra, Deepanshu Vasal, Sriram Vishwanath
In this paper, we consider a sequential stochastic Stackelberg game with two players, a leader and a follower. The follower has access to the state of the system while the leader does not. Assuming that the players act in their respective best interests, the follower's strategy is to play the best response to the leader's strategy. In such a scenario, the leader has the advantage of committing to a policy which maximizes its own returns given the knowledge that the follower is going to play the best response to its policy. Thus, both players converge to a pair of policies that form the Stackelberg equilibrium of the game. Recently,~[1] provided a sequential decomposition algorithm to compute the Stackelberg equilibrium for such games which allow for the computation of Markovian equilibrium policies in linear time as opposed to double exponential, as before. In this paper, we extend the idea to an MDP whose dynamics are not known to the players, to propose an RL algorithm based on Expected Sarsa that learns the Stackelberg equilibrium policy by simulating a model of the MDP. We use particle filters to estimate the belief update for a common agent which computes the optimal policy based on the information which is common to both the players. We present a security game example to illustrate the policy learned by our algorithm. by simulating a model of the MDP. We use particle filters to estimate the belief update for a common agent which computes the optimal policy based on the information which is common to both the players. We present a security game example to illustrate the policy learned by our algorithm.
GTMay 16, 2019
Sequential decomposition of repeated games with asymmetric information and dependent statesDeepanshu Vasal
We consider a finite horizon repeated game with $N$ selfish players who observe their types privately and take actions, which are publicly observed. Their actions and types jointly determine their instantaneous rewards. In each period, players jointly observe actions of each other with delay 1, and private observations of the state of the system, and get an instantaneous reward which is a function of the state and everyone's actions. The players' types are static and are potentially correlated among players. An appropriate notion of equilibrium for such games is Perfect Bayesian Equilibrium (PBE) which consists of a strategy and a belief profile of the players which is coupled across time and as a result, the complexity of finding such equilibria grows double-exponentially in time. We present a sequential decomposition methodology to compute \emph{structured perfect Bayesian equilibria} (SPBE) of this game, introduced in~\cite{VaAn15arxiv}, where equilibrium policy of a player is a function of a common belief and a private state. This methodology computes SPBE in linear time. In general, the SPBE of the game problem exhibit \textit{signaling} behavior, i.e. players' actions reveal part of their private information that is payoff relevant to other players.