LG AI BIO-PH CBJun 17, 2023

FP-IRL: Fokker-Planck Inverse Reinforcement Learning -- A Physics-Constrained Approach to Markov Decision Processes

Chengyang Huang, Siddhartha Srivastava, Kenneth K. Y. Ho, Kathy E. Luker, Gary D. Luker, Xun Huan, Krishna Garikipati

arXiv:2306.10407v21 citationsh-index: 35

Originality Incremental advance

AI Analysis

This addresses a bottleneck in IRL for systems with unknown dynamics, offering a method that is computationally efficient and physically interpretable, though it appears incremental as it builds on existing IRL paradigms with a novel application to Fokker-Planck dynamics.

The paper tackles the challenge of inverse reinforcement learning (IRL) when transition functions are unknown by proposing FP-IRL, a physics-constrained framework that simultaneously infers reward and transition functions from trajectory data without sampled transitions, achieving accurate recovery of agent incentives in synthetic benchmarks and a modified Mountain Car problem.

Inverse reinforcement learning (IRL) is a powerful paradigm for uncovering the incentive structure that drives agent behavior, by inferring an unknown reward function from observed trajectories within a Markov decision process (MDP). However, most existing IRL methods require access to the transition function, either prescribed or estimated \textit{a priori}, which poses significant challenges when the underlying dynamics are unknown, unobservable, or not easily sampled. We propose Fokker--Planck inverse reinforcement learning (FP-IRL), a novel physics-constrained IRL framework tailored for systems governed by Fokker--Planck (FP) dynamics. FP-IRL simultaneously infers both the reward and transition functions directly from trajectory data, without requiring access to sampled transitions. Our method leverages a conjectured equivalence between MDPs and the FP equation, linking reward maximization in MDPs with free energy minimization in FP dynamics. This connection enables inference of the potential function using our inference approach of variational system identification, from which the full set of MDP components -- reward, transition, and policy -- can be recovered using analytic expressions. We demonstrate the effectiveness of FP-IRL through experiments on synthetic benchmarks and a modified version of the Mountain Car problem. Our results show that FP-IRL achieves accurate recovery of agent incentives while preserving computational efficiency and physical interpretability.

View on arXiv PDF

Similar