LG MLMay 27

Reward Transfer from Inverse Reinforcement Learning: A Coupled Minimax Approach

Guang-Yuan Hao, Lars van der Laan, Aurélien Bibaut, Nathan Kallus

arXiv:2605.2783483.0h-index: 11

Predicted impact top 13% in LG · last 90 daysOriginality Incremental advance

AI Analysis

For researchers in transfer learning and reinforcement learning, this work provides a theoretically grounded method to improve reward transfer across environments, though the improvement is incremental over existing sequential approaches.

The paper addresses reward transfer from inverse reinforcement learning in a source environment to reinforcement learning in a target environment. The proposed coupled minimax approach removes first-order influence of source Bellman residual error, achieving better finite-sample error bounds and regret guarantees compared to sequential methods, validated on a sepsis simulator.

We study the transfer of rewards learned using inverse reinforcement learning from expert demonstrations in one environment to reinforcement learning in a new, different environment. This arises naturally when demonstrations are collected in a controlled environment. We formulate the problem as a joint system of Bellman equations across the source and target environments and develop minimax estimators for the target soft-$q$-function. Whereas a sequential solution approach first estimates the source reward and then plugs it into the target control problem, a coupled approach solves the source and target system of equations jointly. We show that, in contrast to the sequential approach, the coupled approach removes the first-order influence of source Bellman residual error. We characterize the local behavior of each approach, develop finite-sample soft-$q$-function error bounds, and prove regret guarantees for the resulting soft-control policy. An empirical investigation using a sepsis simulator validates the theoretical comparison.

View on arXiv PDF

Similar