Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning
This work addresses the performance of offline RL algorithms in practical, noisy environments for researchers and practitioners, but it is incremental as it focuses on benchmarking existing methods.
The study compared model-free, model-based, and hybrid offline reinforcement learning algorithms on industrial benchmark datasets with real-world complexities like noise and partial observability, finding that simpler algorithms, such as rollout-based or model-free with simple regularizers, performed best.
Offline reinforcement learning (RL) Algorithms are often designed with environments such as MuJoCo in mind, in which the planning horizon is extremely long and no noise exists. We compare model-free, model-based, as well as hybrid offline RL approaches on various industrial benchmark (IB) datasets to test the algorithms in settings closer to real world problems, including complex noise and partially observable states. We find that on the IB, hybrid approaches face severe difficulties and that simpler algorithms, such as rollout based algorithms or model-free algorithms with simpler regularizers perform best on the datasets.