LGSPSep 18, 2025

Multi-Fidelity Hybrid Reinforcement Learning via Information Gain Maximization

arXiv:2509.14848v1h-index: 10
Originality Incremental advance
AI Analysis

This addresses the challenge of costly high-fidelity simulations in RL for real-world applications, offering an incremental improvement over existing hybrid methods.

The paper tackles the problem of optimizing reinforcement learning policies when multiple simulators with varying fidelity and cost are available, by introducing a multi-fidelity hybrid RL algorithm that uses information gain maximization for fidelity selection, achieving superior performance in empirical evaluations.

Optimizing a reinforcement learning (RL) policy typically requires extensive interactions with a high-fidelity simulator of the environment, which are often costly or impractical. Offline RL addresses this problem by allowing training from pre-collected data, but its effectiveness is strongly constrained by the size and quality of the dataset. Hybrid offline-online RL leverages both offline data and interactions with a single simulator of the environment. In many real-world scenarios, however, multiple simulators with varying levels of fidelity and computational cost are available. In this work, we study multi-fidelity hybrid RL for policy optimization under a fixed cost budget. We introduce multi-fidelity hybrid RL via information gain maximization (MF-HRL-IGM), a hybrid offline-online RL algorithm that implements fidelity selection based on information gain maximization through a bootstrapping approach. Theoretical analysis establishes the no-regret property of MF-HRL-IGM, while empirical evaluations demonstrate its superior performance compared to existing benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes