Bruno Bouchard

2.5OCOct 19, 2012

Weak Dynamic Programming for Generalized State Constraints

Bruno Bouchard, Marcel Nutz

We provide a dynamic programming principle for stochastic optimal control problems with expectation constraints. A weak formulation, using test functions and a probabilistic relaxation of the constraint, avoids restrictions related to a measurable selection but still implies the Hamilton-Jacobi-Bellman equation in the viscosity sense. We treat open state constraints as a special case of expectation constraints and prove a comparison theorem to obtain the equation for closed state constraints.

3.9AISep 6, 2023

Near-continuous time Reinforcement Learning for continuous state-action spaces

Lorenzo Croissant, Marc Abeille, Bruno Bouchard

We consider the Reinforcement Learning problem of controlling an unknown dynamical system to maximise the long-term average reward along a single trajectory. Most of the literature considers system interactions that occur in discrete time and discrete state-action spaces. Although this standpoint is suitable for games, it is often inadequate for mechanical or digital systems in which interactions occur at a high frequency, if not in continuous time, and whose state spaces are large if not inherently continuous. Perhaps the only exception is the Linear Quadratic framework for which results exist both in discrete and continuous time. However, its ability to handle continuous states comes with the drawback of a rigid dynamic and reward structure. This work aims to overcome these shortcomings by modelling interaction times with a Poisson clock of frequency $\varepsilon^{-1}$, which captures arbitrary time scales: from discrete ($\varepsilon=1$) to continuous time ($\varepsilon\downarrow0$). In addition, we consider a generic reward function and model the state dynamics according to a jump process with an arbitrary transition kernel on $\mathbb{R}^d$. We show that the celebrated optimism protocol applies when the sub-tasks (learning and planning) can be performed effectively. We tackle learning within the eluder dimension framework and propose an approximate planning method based on a diffusive limit approximation of the jump process. Overall, our algorithm enjoys a regret of order $\tilde{\mathcal{O}}(\varepsilon^{1/2} T+\sqrt{T})$. As the frequency of interactions blows up, the approximation error $\varepsilon^{1/2} T$ vanishes, showing that $\tilde{\mathcal{O}}(\sqrt{T})$ is attainable in near-continuous time.

1.2NAJul 28, 2017

Numerical approximation of BSDEs using local polynomial drivers and branching processes

Bruno Bouchard, Xiaolu Tan, Xavier Warin et al.

We propose a new numerical scheme for Backward Stochastic Differential Equations based on branching processes. We approximate an arbitrary (Lipschitz) driver by local polynomials and then use a Picard iteration scheme. Each step of the Picard iteration can be solved by using a representation in terms of branching diffusion systems, thus avoiding the need for a fine time discretization. In contrast to the previous literature on the numerical resolution of BSDEs based on branching processes, we prove the convergence of our numerical scheme without limitation on the time horizon. Numerical simulations are provided to illustrate the performance of the algorithm.

Bruno Bouchard

3 Papers