Michael P. Wellman

AI
h-index6
23papers
950citations
Novelty45%
AI Score42

23 Papers

AIFeb 1, 2023
Combining Deep Reinforcement Learning and Search with Generative Models for Game-Theoretic Opponent Modeling

Zun Li, Marc Lanctot, Kevin R. McKee et al.

Opponent modeling methods typically involve two crucial steps: building a belief distribution over opponents' strategies, and exploiting this opponent model by playing a best response. However, existing approaches typically require domain-specific heurstics to come up with such a model, and algorithms for approximating best responses are hard to scale in large, imperfect information domains. In this work, we introduce a scalable and generic multiagent training regime for opponent modeling using deep game-theoretic reinforcement learning. We first propose Generative Best Respoonse (GenBR), a best response algorithm based on Monte-Carlo Tree Search (MCTS) with a learned deep generative model that samples world states during planning. This new method scales to large imperfect information domains and can be plug and play in a variety of multiagent algorithms. We use this new method under the framework of Policy Space Response Oracles (PSRO), to automate the generation of an \emph{offline opponent model} via iterative game-theoretic reasoning and population-based training. We propose using solution concepts based on bargaining theory to build up an opponent mixture, which we find identifying profiles that are near the Pareto frontier. Then GenBR keeps updating an \emph{online opponent model} and reacts against it during gameplay. We conduct behavioral studies where human participants negotiate with our agents in Deal-or-No-Deal, a class of bilateral bargaining games. Search with generative modeling finds stronger policies during both training time and test time, enables online Bayesian co-player prediction, and can produce agents that achieve comparable social welfare and Nash bargaining score negotiating with humans as humans trading among themselves.

11.4GTMay 4
Learning Bayesian Game Families, with Application to Mechanism Design

Madelyn Gatchel, Michael P. Wellman

Learning or estimating game models from data typically entails inducing separate models for each setting, even if the games are parametrically related. In empirical mechanism design, for example, this approach requires learning a new game model for each candidate setting of the mechanism parameter. Recent work has shown the data efficiency benefits of learning a single parameterized model for families of related games. In Bayesian games -- a typical model for mechanism design -- payoffs depend on both the actions and types of the players. We show how to exploit this structure by learning an interim game-family model that conditions on a single player's type. We compare this to the baseline approach of directly learning the ex ante payoff function, which gives payoffs in expectation of all player types. By marginalizing over player type, the interim model can also provide ex ante payoff predictions, as necessary for Bayes-Nash equilibrium approximation. We also leverage the interim model to compute new beneficial piecewise best-response strategies, without any additional sample data. We validate our method through a case study of a dynamic sponsored search auction. For both payoff accuracy and Nash-approximation error, the interim model matches the ex ante model on the trained range, and outperforms ex ante in extrapolation. Our case study demonstrates that Bayesian game-family models can support comprehensive mechanism design, and that through interim-stage modeling we can enhance expressivity and reliability.

GTFeb 5, 2025
Policy Abstraction and Nash Refinement in Tree-Exploiting PSRO

Christine Konicki, Mithun Chakraborty, Michael P. Wellman

Policy Space Response Oracles (PSRO) interleaves empirical game-theoretic analysis with deep reinforcement learning (DRL) to solve games too complex for traditional analytic methods. Tree-exploiting PSRO (TE-PSRO) is a variant of this approach that iteratively builds a coarsened empirical game model in extensive form using data obtained from querying a simulator that represents a detailed description of the game. We make two main methodological advances to TE-PSRO that enhance its applicability to complex games of imperfect information. First, we introduce a scalable representation for the empirical game tree where edges correspond to implicit policies learned through DRL. These policies cover conditions in the underlying game abstracted in the game model, supporting sustainable growth of the tree over epochs. Second, we leverage extensive form in the empirical model by employing refined Nash equilibria to direct strategy exploration. To enable this, we give a modular and scalable algorithm based on generalized backward induction for computing a subgame perfect equilibrium (SPE) in an imperfect-information game. We experimentally evaluate our approach on a suite of games including an alternating-offer bargaining game with outside offers; our results demonstrate that TE-PSRO converges toward equilibrium faster when new strategies are generated based on SPE rather than Nash equilibrium, and with reasonable time/memory requirements for the growing empirical model.

MAMay 23, 2023
Co-Learning Empirical Games and World Models

Max Olan Smith, Michael P. Wellman

Game-based decision-making involves reasoning over both world dynamics and strategic interactions among the agents. Typically, empirical models capturing these respective aspects are learned and used separately. We investigate the potential gain from co-learning these elements: a world model for dynamics and an empirical game for strategic interactions. Empirical games drive world models toward a broader consideration of possible game dynamics induced by a diversity of strategy profiles. Conversely, world models guide empirical games to efficiently discover new strategies through planning. We demonstrate these benefits first independently, then in combination as realized by a new algorithm, Dyna-PSRO, that co-learns an empirical game and a world model. When compared to PSRO -- a baseline empirical-game building algorithm, Dyna-PSRO is found to compute lower regret solutions on partially observable general-sum games. In our experiments, Dyna-PSRO also requires substantially fewer experiences than PSRO, a key algorithmic advantage for settings where collecting player-game interaction data is a cost-limiting factor.

GTJan 16, 2014
Multiattribute Auctions Based on Generalized Additive Independence

Yagil Engel, Michael P. Wellman

We develop multiattribute auctions that accommodate generalized additive independent (GAI) preferences. We propose an iterative auction mechanism that maintains prices on potentially overlapping GAI clusters of attributes, thus decreases elicitation and computational burden, and creates an open competition among suppliers over a multidimensional domain. Most significantly, the auction is guaranteed to achieve surplus which approximates optimal welfare up to a small additive factor, under reasonable equilibrium strategies of traders. The main departure of GAI auctions from previous literature is to accommodate non-additive trader preferences, hence allowing traders to condition their evaluation of specific attributes on the value of other attributes. At the same time, the GAI structure supports a compact representation of prices, enabling a tractable auction process. We perform a simulation study, demonstrating and quantifying the significant efficiency advantage of more expressive preference modeling. We draw random GAI-structured utility functions with various internal structures, generate additive functions that approximate the GAI utility, and compare the performance of the auctions using the two representations. We find that allowing traders to express existing dependencies among attributes improves the economic efficiency of multiattribute auctions.

GTJun 3, 2013
Analyzing Incentives for Protocol Compliance in Complex Domains: A Case Study of Introduction-Based Routing

Michael P. Wellman, Tae Hyung Kim, Quang Duong

Formal analyses of incentives for compliance with network protocols often appeal to game-theoretic models and concepts. Applications of game-theoretic analysis to network security have generally been limited to highly stylized models, where simplified environments enable tractable study of key strategic variables. We propose a simulation-based approach to game-theoretic analysis of protocol compliance, for scenarios with large populations of agents and large policy spaces. We define a general procedure for systematically exploring a structured policy space, directed expressly to resolve the qualitative classification of equilibrium behavior as compliant or non-compliant. The techniques are illustrated and exercised through an extensive case study analyzing compliance incentives for introduction-based routing. We find that the benefits of complying with the protocol are particularly strong for nodes subject to attack, and the overall compliance level achieved in equilibrium, while not universal, is sufficient to support the desired security goals of the protocol.

AIMar 27, 2013
Qualitative Probabilistic Networks for Planning Under Uncertainty

Michael P. Wellman

Bayesian networks provide a probabilistic semantics for qualitative assertions about likelihood. A qualitative reasoner based on an algebra over these assertions can derive further conclusions about the influence of actions. While the conclusions are much weaker than those computed from complete probability distributions, they are still valuable for suggesting potential actions, eliminating obviously inferior plans, identifying important tradeoffs, and explaining probabilistic models.

AIMar 27, 2013
The Role of Calculi in Uncertain Inference Systems

Michael P. Wellman, David Heckerman

Much of the controversy about methods for automated decision making has focused on specific calculi for combining beliefs or propagating uncertainty. We broaden the debate by (1) exploring the constellation of secondary tasks surrounding any primary decision problem, and (2) identifying knowledge engineering concerns that present additional representational tradeoffs. We argue on pragmatic grounds that the attempt to support all of these tasks within a single calculus is misguided. In the process, we note several uncertain reasoning objectives that conflict with the Bayesian ideal of complete specification of probabilities and utilities. In response, we advocate treating the uncertainty calculus as an object language for reasoning mechanisms that support the secondary tasks. Arguments against Bayesian decision theory are weakened when the calculus is relegated to this role. Architectures for uncertainty handling that take statements in the calculus as objects to be reasoned about offer the prospect of retaining normative status with respect to decision making while supporting the other tasks in uncertain reasoning.

AIMar 27, 2013
Exploiting Functional Dependencies in Qualitative Probabilistic Reasoning

Michael P. Wellman

Functional dependencies restrict the potential interactions among variables connected in a probabilistic network. This restriction can be exploited in qualitative probabilistic reasoning by introducing deterministic variables and modifying the inference rules to produce stronger conclusions in the presence of functional relations. I describe how to accomplish these modifications in qualitative probabilistic networks by exhibiting the update procedures for graphical transformations involving probabilistic and deterministic variables and combinations. A simple example demonstrates that the augmented scheme can reduce qualitative ambiguity that would arise without the special treatment of functional dependency. Analysis of qualitative synergy reveals that new higher-order relations are required to reason effectively about synergistic interactions among deterministic variables.

AIFeb 27, 2013
State-space Abstraction for Anytime Evaluation of Probabilistic Networks

Michael P. Wellman, Chao-Lin Liu

One important factor determining the computational complexity of evaluating a probabilistic network is the cardinality of the state spaces of the nodes. By varying the granularity of the state spaces, one can trade off accuracy in the result for computational efficiency. We present an anytime procedure for approximate evaluation of probabilistic networks based on this idea. On application to some simple networks, the procedure exhibits a smooth improvement in approximation quality as computation time increases. This suggests that state-space abstraction is one more useful control parameter for designing real-time probabilistic reasoners.

AIFeb 27, 2013
The Automated Mapping of Plans for Plan Recognition

Marcus J. Huber, Edmund H. Durfee, Michael P. Wellman

To coordinate with other agents in its environment, an agent needs models of what the other agents are trying to do. When communication is impossible or expensive, this information must be acquired indirectly via plan recognition. Typical approaches to plan recognition start with a specification of the possible plans the other agents may be following, and develop special techniques for discriminating among the possibilities. Perhaps more desirable would be a uniform procedure for mapping plans to general structures supporting inference based on uncertain and incomplete observations. In this paper, we describe a set of methods for converting plans represented in a flexible procedural language to observation models represented as probabilistic belief networks.

AIFeb 20, 2013
Path Planning under Time-Dependent Uncertainty

Michael P. Wellman, Matthew Ford, Kenneth Larson

Standard algorithms for finding the shortest path in a graph require that the cost of a path be additive in edge costs, and typically assume that costs are deterministic. We consider the problem of uncertain edge costs, with potential probabilistic dependencies among the costs. Although these dependencies violate the standard dynamic-programming decomposition, we identify a weaker stochastic consistency condition that justifies a generalized dynamic-programming approach based on stochastic dominance. We present a revised path-planning algorithm and prove that it produces optimal paths under time-dependent uncertain costs. We test the algorithm by applying it to a model of stochastic bus networks, and present empirical performance results comparing it to some alternatives. Finally, we consider extensions of these concepts to a more general class of problems of heuristic search under uncertainty.

AIFeb 20, 2013
Accounting for Context in Plan Recognition, with Application to Traffic Monitoring

David V. Pynadath, Michael P. Wellman

Typical approaches to plan recognition start from a representation of an agent's possible plans, and reason evidentially from observations of the agent's actions to assess the plausibility of the various candidates. A more expansive view of the task (consistent with some prior work) accounts for the context in which the plan was generated, the mental state and planning process of the agent, and consequences of the agent's actions in the world. We present a general Bayesian framework encompassing this view, and focus on how context can be exploited in plan recognition. We demonstrate the approach on a problem in traffic monitoring, where the objective is to induce the plan of the driver from observation of vehicle movements. Starting from a model of how the driver generates plans, we show how the highway context can appropriately influence the recognizer's interpretation of observed driver behavior.

AIFeb 13, 2013
Optimal Factory Scheduling using Stochastic Dominance A*

Peter R. Wurman, Michael P. Wellman

We examine a standard factory scheduling problem with stochastic processing and setup times, minimizing the expectation of the weighted number of tardy jobs. Because the costs of operators in the schedule are stochastic and sequence dependent, standard dynamic programming algorithms such as A* may fail to find the optimal schedule. The SDA* (Stochastic Dominance A*) algorithm remedies this difficulty by relaxing the pruning condition. We present an improved state-space search formulation for these problems and discuss the conditions under which stochastic scheduling problems can be solved optimally using SDA*. In empirical testing on randomly generated problems, we found that in 70%, the expected cost of the optimal stochastic solution is lower than that of the solution derived using a deterministic approximation, with comparable search effort.

GTFeb 13, 2013
Toward a Market Model for Bayesian Inference

David M. Pennock, Michael P. Wellman

We present a methodology for representing probabilistic relationships in a general-equilibrium economic model. Specifically, we define a precise mapping from a Bayesian network with binary nodes to a market price system where consumers and producers trade in uncertain propositions. We demonstrate the correspondence between the equilibrium prices of goods in this economy and the probabilities represented by the Bayesian network. A computational market model such as this may provide a useful framework for investigations of belief aggregation, distributed probabilistic inference, resource allocation under uncertainty, and other problems of decentralized uncertainty.

AIFeb 6, 2013
Representing Aggregate Belief through the Competitive Equilibrium of a Securities Market

David M. Pennock, Michael P. Wellman

We consider the problem of belief aggregation: given a group of individual agents with probabilistic beliefs over a set of uncertain events, formulate a sensible consensus or aggregate probability distribution over these events. Researchers have proposed many aggregation methods, although on the question of which is best the general consensus is that there is no consensus. We develop a market-based approach to this problem, where agents bet on uncertain events by buying or selling securities contingent on their outcomes. Each agent acts in the market so as to maximize expected utility at given securities prices, limited in its activity only by its own risk aversion. The equilibrium prices of goods in this market represent aggregate beliefs. For agents with constant risk aversion, we demonstrate that the aggregate probability exhibits several desirable properties, and is related to independently motivated techniques. We argue that the market-based approach provides a plausible mechanism for belief aggregation in multiagent systems, as it directly addresses self-motivated agent incentives for participation and for truthfulness, and can provide a decision-theoretic foundation for the "expert weights" often employed in centralized pooling techniques.

AIJan 30, 2013
Using Qualitative Relationships for Bounding Probability Distributions

Chao-Lin Liu, Michael P. Wellman

We exploit qualitative probabilistic relationships among variables for computing bounds of conditional probability distributions of interest in Bayesian networks. Using the signs of qualitative relationships, we can implement abstraction operations that are guaranteed to bound the distributions of interest in the desired direction. By evaluating incrementally improved approximate networks, our algorithm obtains monotonically tightening bounds that converge to exact distributions. For supermodular utility functions, the tightening bounds monotonically reduce the set of admissible decision alternatives as well.

AIJan 30, 2013
Incremental Tradeoff Resolution in Qualitative Probabilistic Networks

Chao-Lin Liu, Michael P. Wellman

Qualitative probabilistic reasoning in a Bayesian network often reveals tradeoffs: relationships that are ambiguous due to competing qualitative influences. We present two techniques that combine qualitative and numeric probabilistic reasoning to resolve such tradeoffs, inferring the qualitative relationship between nodes in a Bayesian network. The first approach incrementally marginalizes nodes that contribute to the ambiguous qualitative relationships. The second approach evaluates approximate Bayesian networks for bounds of probability distributions, and uses these bounds to determinate qualitative relationships in question. This approach is also incremental in that the algorithm refines the state spaces of random variables for tighter bounds until the qualitative relationships are resolved. Both approaches provide systematic methods for tradeoff resolution at potentially lower computational cost than application of purely numeric methods.

AIJan 23, 2013
Graphical Representations of Consensus Belief

David M. Pennock, Michael P. Wellman

Graphical models based on conditional independence support concise encodings of the subjective belief of a single agent. A natural question is whether the consensus belief of a group of agents can be represented with equal parsimony. We prove, under relatively mild assumptions, that even if everyone agrees on a common graph topology, no method of combining beliefs can maintain that structure. Even weaker conditions rule out local aggregation within conditional probability tables. On a more positive note, we show that if probabilities are combined with the logarithmic opinion pool (LogOP), then commonly held Markov independencies are maintained. This suggests a straightforward procedure for constructing a consensus Markov network. We describe an algorithm for computing the LogOP with time complexity comparable to that of exact Bayesian inference.

AIJan 16, 2013
Probabilistic State-Dependent Grammars for Plan Recognition

David V. Pynadath, Michael P. Wellman

Techniques for plan recognition under uncertainty require a stochastic model of the plan-generation process. We introduce Probabilistic State-Dependent Grammars (PSDGs) to represent an agent's plan-generation process. The PSDG language model extends probabilistic context-free grammars (PCFGs) by allowing production probabilities to depend on an explicit model of the planning agent's internal and external state. Given a PSDG description of the plan-generation process, we can then use inference algorithms that exploit the particular independence properties of the PSDG language to efficiently answer plan-recognition queries. The combination of the PSDG language model and inference algorithms extends the range of plan-recognition domains for which practical probabilistic inference is possible, as illustrated by applications in traffic monitoring and air combat.

GTJun 20, 2012
Constrained Automated Mechanism Design for Infinite Games of Incomplete Information

Yevgeniy Vorobeychik, Daniel Reeves, Michael P. Wellman

We present a functional framework for automated mechanism design based on a two-stage game model of strategic interaction between the designer and the mechanism participants, and apply it to several classes of two-player infinite games of incomplete information. At the core of our framework is a black-box optimization algorithm which guides the selection process of candidate mechanisms. Our approach yields optimal or nearly optimal mechanisms in several application domains using various objective functions. By comparing our results with known optimal mechanisms, and in some cases improving on the best known mechanisms, we provide evidence that ours is a promising approach to parametric design of indirect mechanisms.

AIJun 13, 2012
Knowledge Combination in Graphical Multiagent Model

Quang Duong, Michael P. Wellman, Satinder Singh

A graphical multiagent model (GMM) represents a joint distribution over the behavior of a set of agents. One source of knowledge about agents' behavior may come from gametheoretic analysis, as captured by several graphical game representations developed in recent years. GMMs generalize this approach to express arbitrary distributions, based on game descriptions or other sources of knowledge bearing on beliefs about agent behavior. To illustrate the flexibility of GMMs, we exhibit game-derived models that allow probabilistic deviation from equilibrium, as well as models based on heuristic action choice. We investigate three different methods of integrating these models into a single model representing the combined knowledge sources. To evaluate the predictive performance of the combined model, we treat as actual outcome the behavior produced by a reinforcement learning process. We find that combining the two knowledge sources, using any of the methods, provides better predictions than either source alone. Among the combination methods, mixing data outperforms the opinion pool and direct update methods investigated in this empirical trial.

GTFeb 14, 2012
The Structure of Signals: Causal Interdependence Models for Games of Incomplete Information

Michael P. Wellman, Lu Hong, Scott E. Page

Traditional economic models typically treat private information, or signals, as generated from some underlying state. Recent work has explicated alternative models, where signals correspond to interpretations of available information. We show that the difference between these formulations can be sharply cast in terms of causal dependence structure, and employ graphical models to illustrate the distinguishing characteristics. The graphical representation supports inferences about signal patterns in the interpreted framework, and suggests how results based on the generated model can be extended to more general situations. Specific insights about bidding games in classical auction mechanisms derive from qualitative graphical models.