LGDec 4, 2024

Learning on One Mode: Addressing Multi-modality in Offline Reinforcement Learning

arXiv:2412.03258v24 citationsh-index: 2ICLR
Originality Incremental advance
AI Analysis

This addresses a common challenge in offline RL for researchers and practitioners, offering an incremental improvement by focusing on single-mode learning to avoid suboptimal performance from multi-modal data.

The paper tackles the problem of handling multi-modal action distributions in offline reinforcement learning, where existing methods assume unimodal behavior and perform poorly. The proposed LOM method learns from a single promising mode, outperforming existing methods on D4RL benchmarks with demonstrated effectiveness in complex scenarios.

Offline reinforcement learning (RL) seeks to learn optimal policies from static datasets without interacting with the environment. A common challenge is handling multi-modal action distributions, where multiple behaviours are represented in the data. Existing methods often assume unimodal behaviour policies, leading to suboptimal performance when this assumption is violated. We propose weighted imitation Learning on One Mode (LOM), a novel approach that focuses on learning from a single, promising mode of the behaviour policy. By using a Gaussian mixture model to identify modes and selecting the best mode based on expected returns, LOM avoids the pitfalls of averaging over conflicting actions. Theoretically, we show that LOM improves performance while maintaining simplicity in policy learning. Empirically, LOM outperforms existing methods on standard D4RL benchmarks and demonstrates its effectiveness in complex, multi-modal scenarios.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes