Adaptive Correlation-Weighted Intrinsic Rewards for Reinforcement Learning
This work addresses the challenge of unstable or suboptimal exploration in sparse reward RL for researchers and practitioners, representing an incremental improvement over existing methods.
The paper tackles the problem of balancing intrinsic and extrinsic rewards for exploration in sparse reward reinforcement learning by proposing ACWI, an adaptive intrinsic reward scaling framework that learns state-dependent coefficients online, resulting in improved sample efficiency and stability compared to fixed baselines in MiniGrid environments.
We propose ACWI (Adaptive Correlation Weighted Intrinsic), an adaptive intrinsic reward scaling framework designed to dynamically balance intrinsic and extrinsic rewards for improved exploration in sparse reward reinforcement learning. Unlike conventional approaches that rely on manually tuned scalar coefficients, which often result in unstable or suboptimal performance across tasks, ACWI learns a state dependent scaling coefficient online. Specifically, ACWI introduces a lightweight Beta Network that predicts the intrinsic reward weight directly from the agent state through an encoder based architecture. The scaling mechanism is optimized using a correlation based objective that encourages alignment between the weighted intrinsic rewards and discounted future extrinsic returns. This formulation enables task adaptive exploration incentives while preserving computational efficiency and training stability. We evaluate ACWI on a suite of sparse reward environments in MiniGrid. Experimental results demonstrate that ACWI consistently improves sample efficiency and learning stability compared to fixed intrinsic reward baselines, achieving superior performance with minimal computational overhead.