1.8NAMay 23
Explicit Runge-Kutta schemes for Backward Stochastic Differential EquationsShuixin Fang, Yue Qiu, Weidong Zhao
The Butcher theory provides a powerful tool for analyzing order conditions of Runge-Kutta schemes for ordinary differential equations (ODEs); however, such a theory has not yet been well established for backward stochastic differential equations (BSDEs) -- motivating the current work to address this gap. Specifically, we propose a new class of explicit Runge-Kutta schemes for BSDEs. These schemes admit a concise formulation that closely mirrors their ODE counterparts. Building on this formulation, we extend the Butcher theory to the proposed schemes, thereby enabling a symbolic derivation of Taylor expansions for the local truncation errors, and yielding the order conditions. Our approach preserves the elegance and generality of the original Butcher theory: it avoids stage-by-stage error expansions and provides a systematic, stage-inductive analysis, applicable to schemes with any number of stages and any target order. Numerical experiments support the theoretical results.
34.6NAApr 29
Deep Policy Iteration for High-Dimensional Mean-Field Games with Regenerative ReformulationShuixin Fang, Shupeng Wang, Zhen Wu et al.
This paper develops a deep policy iteration method for high-dimensional finite-horizon mean-field games. We reformulate the game as a regenerative problem with deterministic cycles, which allows policy evaluation (PE), policy improvement (PI), and population measure estimation to be carried out cycle by cycle. Within this formulation, we approximate the population measure by a particle system and update it using a one-step random mapping induced by the Euler-Maruyama discretization of the state dynamics. This update transports a mini-batch of particles from one cycle to the next, avoiding sequential trajectory simulation over the entire time horizon at each iteration. The PE and PI subproblems are formulated through the relation between consecutive cycles, with adversarial training used for evaluation and averaged optimization used for improvement. The resulting method is efficient and scalable in high dimensions, as it avoids the direct solution of the coupled Hamilton-Jacobi-Bellman and Fokker-Planck system, the full simulation of trajectories to estimate the population measure, the explicit computation of conditional expectations in policy evaluation, and pointwise optimization in policy improvement. Numerical experiments demonstrate that the proposed method effectively handles dimensions up to 10,000.