LGAIMar 23, 2025

Adaptive Multi-Fidelity Reinforcement Learning for Variance Reduction in Engineering Design Optimization

arXiv:2503.18229v11 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses variance reduction in engineering design optimization using multi-fidelity RL, offering an incremental improvement over hierarchical methods.

The paper tackles the problem of high variance in policy learning for multi-fidelity reinforcement learning due to heterogeneous error distributions in models, proposing an adaptive framework that uses non-hierarchical low-fidelity models dynamically. The results show substantial variance reduction and improved convergence in an octocopter design optimization, eliminating manual tuning overhead.

Multi-fidelity Reinforcement Learning (RL) frameworks efficiently utilize computational resources by integrating analysis models of varying accuracy and costs. The prevailing methodologies, characterized by transfer learning, human-inspired strategies, control variate techniques, and adaptive sampling, predominantly depend on a structured hierarchy of models. However, this reliance on a model hierarchy can exacerbate variance in policy learning when the underlying models exhibit heterogeneous error distributions across the design space. To address this challenge, this work proposes a novel adaptive multi-fidelity RL framework, in which multiple heterogeneous, non-hierarchical low-fidelity models are dynamically leveraged alongside a high-fidelity model to efficiently learn a high-fidelity policy. Specifically, low-fidelity policies and their experience data are adaptively used for efficient targeted learning, guided by their alignment with the high-fidelity policy. The effectiveness of the approach is demonstrated in an octocopter design optimization problem, utilizing two low-fidelity models alongside a high-fidelity simulator. The results demonstrate that the proposed approach substantially reduces variance in policy learning, leading to improved convergence and consistent high-quality solutions relative to traditional hierarchical multi-fidelity RL methods. Moreover, the framework eliminates the need for manually tuning model usage schedules, which can otherwise introduce significant computational overhead. This positions the framework as an effective variance-reduction strategy for multi-fidelity RL, while also mitigating the computational and operational burden of manual fidelity scheduling.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes