LG AIOct 17, 2025

OffSim: Offline Simulator for Model-based Offline Inverse Reinforcement Learning

Woo-Jin Ahn, Sang-Ryul Baek, Yong-Jun Lee, Hyun-Duck Choi, Myo-Taeg Lim

arXiv:2510.15495v14.1h-index: 9

Originality Incremental advance

AI Analysis

This work addresses the challenge of automating simulator and reward design for reinforcement learning practitioners, though it appears incremental as an extension of existing offline IRL techniques.

The authors tackled the problem of time-consuming simulator development and manual reward specification in reinforcement learning by proposing OffSim, a model-based offline inverse reinforcement learning framework that learns environmental dynamics and reward functions from expert trajectories. Their method achieved substantial performance gains over existing offline IRL methods in MuJoCo experiments.

Reinforcement learning algorithms typically utilize an interactive simulator (i.e., environment) with a predefined reward function for policy training. Developing such simulators and manually defining reward functions, however, is often time-consuming and labor-intensive. To address this, we propose an Offline Simulator (OffSim), a novel model-based offline inverse reinforcement learning (IRL) framework, to emulate environmental dynamics and reward structure directly from expert-generated state-action trajectories. OffSim jointly optimizes a high-entropy transition model and an IRL-based reward function to enhance exploration and improve the generalizability of the learned reward. Leveraging these learned components, OffSim can subsequently train a policy offline without further interaction with the real environment. Additionally, we introduce OffSim$^+$, an extension that incorporates a marginal reward for multi-dataset settings to enhance exploration. Extensive MuJoCo experiments demonstrate that OffSim achieves substantial performance gains over existing offline IRL methods, confirming its efficacy and robustness.

View on arXiv PDF

Similar