Kernel Based Maximum Entropy Inverse Reinforcement Learning for Mean-Field Games
This addresses the problem of learning complex reward structures in mean-field games for applications like traffic routing, though it is incremental by extending existing methods to infinite-horizon and nonlinear settings.
The paper tackles the inverse reinforcement learning problem for infinite-horizon stationary mean-field games by modeling the reward function in a reproducing kernel Hilbert space, enabling inference of nonlinear rewards from expert demonstrations, and demonstrates effectiveness in accurately recovering expert behavior in a traffic routing game.
We consider the maximum causal entropy inverse reinforcement learning problem for infinite-horizon stationary mean-field games, in which we model the unknown reward function within a reproducing kernel Hilbert space. This allows the inference of rich and potentially nonlinear reward structures directly from expert demonstrations, in contrast to most existing inverse reinforcement learning approaches for mean-field games that typically restrict the reward function to a linear combination of a fixed finite set of basis functions. We also focus on the infinite-horizon cost structure, whereas prior studies primarily rely on finite-horizon formulations. We introduce a Lagrangian relaxation to this maximum causal entropy inverse reinforcement learning problem that enables us to reformulate it as an unconstrained log-likelihood maximization problem, and obtain a solution \lk{via} a gradient ascent algorithm. To illustrate the theoretical consistency of the algorithm, we establish the smoothness of the log-likelihood objective by proving the Fréchet differentiability of the related soft Bellman operators with respect to the parameters in the reproducing kernel Hilbert space. We demonstrate the effectiveness of our method on a mean-field traffic routing game, where it accurately recovers expert behavior.