Inverse Transition Learning: Learning Dynamics from Demonstrations
This work addresses the challenge of learning dynamics from limited expert data for applications such as healthcare, though it appears incremental as it builds on existing constraint-based and Bayesian methods.
The paper tackles the problem of estimating transition dynamics from near-optimal expert trajectories in offline model-based reinforcement learning, resulting in significant improvements in decision-making and the ability to inform successful transfer, as demonstrated in synthetic environments and real healthcare scenarios like ICU patient management.
We consider the problem of estimating the transition dynamics $T^*$ from near-optimal expert trajectories in the context of offline model-based reinforcement learning. We develop a novel constraint-based method, Inverse Transition Learning, that treats the limited coverage of the expert trajectories as a \emph{feature}: we use the fact that the expert is near-optimal to inform our estimate of $T^*$. We integrate our constraints into a Bayesian approach. Across both synthetic environments and real healthcare scenarios like Intensive Care Unit (ICU) patient management in hypotension, we demonstrate not only significant improvements in decision-making, but that our posterior can inform when transfer will be successful.