Umarbek Guvercin

66.8SYMay 26

Private & Common Information States in Decentralized Team Equilibrium via Dynamic Programming for POMDPs with Delayed Sharing

Charalambos D. Charalambous, Umarbek Guvercin, Seddik Djouadi

Witsenhausen, in his seminal 1971 paper [1], introduced decentralized partially observable Markov decision problems (POMDPs), with multiple agents or controls operating under T-step delayed sharing information patterns. A fundamental problem in [1] is the identification of structural properties of optimal strategies that compress the information patterns into multiple information states. In this paper, we develop such structural properties of optimal strategies and associated dynamic programming (DP) equations, using the concept of decentralized sequential team equilibrium (a generalization of person-by-person optimality from static team theory). Within this framework, each strategy is assigned an individual value function conditioned on its delayed sharing information pattern, while the strategies of all other agents are held fixed. The resulting DP framework yields several new DP equations and characterizations of decentralized team equilibrium. Moreover, these DP equations exhibit fundamental properties analogous to those of centralized DP of POMDPs: the optimization in each agent's DP equations is performed over the agent's action space rather than over strategy spaces; each agent's multiple information states satisfy Markov recursions; and a separation principle holds. The DP equations reveal a structural compression property of optimal strategies: each agent compresses its delayed sharing information pattern into three components: 1) a private posterior distribution conditioned on the agent's delayed sharing information pattern, 2) a centralized posterior distribution conditioned on the common information shared by all agents, and 3) the agent's private information component. This structural result substantially extends Witsenhausen's Assertion 8 in [1].

76.0SYApr 25

Private and Common Information States in Decentralized Parallel Dynamic Programming for Delayed Sharing Patterns

Charalambos D. Charalambous, Umarbek Guvercin, Seddik Djouadi

This paper develops a dynamic programming (DP) approach for decentralized stochastic optimal control problems with delayed sharing information patterns, which exhibits the fundamental Properties of classical DP of centralized partially observable Markov decision problems (POMDPs): the value functions and information states depend on the actions of the minimizing controls and not their strategies. This is achieved by invoking the concept of Person-by-Person (PbP) optimality, in which each control strategy is associated with a value function conditioned on its assigned delayed sharing information pattern, when all other strategies are fixed to their optimal responses. The value functions satisfy generalized and simplified DP equations. These are used to derive necessary and sufficient conditions for PbP optimality. The simplified DP equations are obtained by invoking the structural property that optimal strategies are separated and functionals of two information states: 1) a private a posteriori probability distribution based on the information pattern of the strategy, and 2) a centralized a posteriori probability distribution based on the shared or common information to all strategies, each satisfying a Markov recursion. The DP approach of this paper, settles a long standing open problem since the appearance of T-step delayed sharing patterns in [1, Section IV.G], in terms of generalizing the fundamental properties of classical DP approach.

Umarbek Guvercin

2 Papers