Bart van den Broek

2papers

2 Papers

SYMar 15, 2012
Risk Sensitive Path Integral Control

Bart van den Broek, Wim Wiegerinck, Hilbert Kappen

Recently path integral methods have been developed for stochastic optimal control for a wide class of models with non-linear dynamics in continuous space-time. Path integral methods find the control that minimizes the expected cost-to-go. In this paper we show that under the same assumptions, path integral methods generalize directly to risk sensitive stochastic optimal control. Here the method minimizes in expectation an exponentially weighted cost-to-go. Depending on the exponential weight, risk seeking or risk averse behaviour is obtained. We demonstrate the approach on risk sensitive stochastic optimal control problems beyond the linear-quadratic case, showing the intricate interaction of multi-modal control with risk sensitivity.

MAJun 27, 2012
Stochastic Optimal Control in Continuous Space-Time Multi-Agent Systems

Wim Wiegerinck, Bart van den Broek, Hilbert Kappen

Recently, a theory for stochastic optimal control in non-linear dynamical systems in continuous space-time has been developed (Kappen, 2005). We apply this theory to collaborative multi-agent systems. The agents evolve according to a given non-linear dynamics with additive Wiener noise. Each agent can control its own dynamics. The goal is to minimize the accumulated joint cost, which consists of a state dependent term and a term that is quadratic in the control. We focus on systems of non-interacting agents that have to distribute themselves optimally over a number of targets, given a set of end-costs for the different possible agent-target combinations. We show that optimal control is the combinatorial sum of independent single-agent single-target optimal controls weighted by a factor proportional to the end-costs of the different combinations. Thus, multi-agent control is related to a standard graphical model inference problem. The additional computational cost compared to single-agent control is exponential in the tree-width of the graph specifying the combinatorial sum times the number of targets. We illustrate the result by simulations of systems with up to 42 agents.