LGFeb 3
From Scalar Rewards to Potential Trends: Shaping Potential Landscapes for Model-Based Reinforcement LearningYao-Hui Li, Zeyu Wang, Xin Li et al.
Model-based reinforcement learning (MBRL) achieves high sample efficiency by simulating future trajectories with learned dynamics and reward models. However, its effectiveness is severely compromised in sparse reward settings. The core limitation lies in the standard paradigm of regressing ground-truth scalar rewards: in sparse environments, this yields a flat, gradient-free landscape that fails to provide directional guidance for planning. To address this challenge, we propose Shaping Landscapes with Optimistic Potential Estimates (SLOPE), a novel framework that shifts reward modeling from predicting scalars to constructing informative potential landscapes. SLOPE employs optimistic distributional regression to estimate high-confidence upper bounds, which amplifies rare success signals and ensures sufficient exploration gradients. Evaluations on 30+ tasks across 5 benchmarks demonstrate that SLOPE consistently outperforms leading baselines in fully sparse, semi-sparse, and dense rewards.
LGFeb 3, 2025
Learning Fused State Representations for Control from Multi-View ObservationsZeyu Wang, Yao-Hui Li, Xin Li et al.
Multi-View Reinforcement Learning (MVRL) seeks to provide agents with multi-view observations, enabling them to perceive environment with greater effectiveness and precision. Recent advancements in MVRL focus on extracting latent representations from multiview observations and leveraging them in control tasks. However, it is not straightforward to learn compact and task-relevant representations, particularly in the presence of redundancy, distracting information, or missing views. In this paper, we propose Multi-view Fusion State for Control (MFSC), firstly incorporating bisimulation metric learning into MVRL to learn task-relevant representations. Furthermore, we propose a multiview-based mask and latent reconstruction auxiliary task that exploits shared information across views and improves MFSC's robustness in missing views by introducing a mask token. Extensive experimental results demonstrate that our method outperforms existing approaches in MVRL tasks. Even in more realistic scenarios with interference or missing views, MFSC consistently maintains high performance.
LGOct 6, 2025
Wavelet Predictive Representations for Non-Stationary Reinforcement LearningMin Wang, Xin Li, Ye He et al.
The real world is inherently non-stationary, with ever-changing factors, such as weather conditions and traffic flows, making it challenging for agents to adapt to varying environmental dynamics. Non-Stationary Reinforcement Learning (NSRL) addresses this challenge by training agents to adapt rapidly to sequences of distinct Markov Decision Processes (MDPs). However, existing NSRL approaches often focus on tasks with regularly evolving patterns, leading to limited adaptability in highly dynamic settings. Inspired by the success of Wavelet analysis in time series modeling, specifically its ability to capture signal trends at multiple scales, we propose WISDOM to leverage wavelet-domain predictive task representations to enhance NSRL. WISDOM captures these multi-scale features in evolving MDP sequences by transforming task representation sequences into the wavelet domain, where wavelet coefficients represent both global trends and fine-grained variations of non-stationary changes. In addition to the auto-regressive modeling commonly employed in time series forecasting, we devise a wavelet temporal difference (TD) update operator to enhance tracking and prediction of MDP evolution. We theoretically prove the convergence of this operator and demonstrate policy improvement with wavelet task representations. Experiments on diverse benchmarks show that WISDOM significantly outperforms existing baselines in both sample efficiency and asymptotic performance, demonstrating its remarkable adaptability in complex environments characterized by non-stationary and stochastically evolving tasks.
LGJul 24, 2025
Revisiting Bisimulation Metric for Robust Representations in Reinforcement LearningLeiji Zhang, Zeyu Wang, Xin Li et al.
Bisimulation metric has long been regarded as an effective control-related representation learning technique in various reinforcement learning tasks. However, in this paper, we identify two main issues with the conventional bisimulation metric: 1) an inability to represent certain distinctive scenarios, and 2) a reliance on predefined weights for differences in rewards and subsequent states during recursive updates. We find that the first issue arises from an imprecise definition of the reward gap, whereas the second issue stems from overlooking the varying importance of reward difference and next-state distinctions across different training stages and task settings. To address these issues, by introducing a measure for state-action pairs, we propose a revised bisimulation metric that features a more precise definition of reward gap and novel update operators with adaptive coefficient. We also offer theoretical guarantees of convergence for our proposed metric and its improved representation distinctiveness. In addition to our rigorous theoretical analysis, we conduct extensive experiments on two representative benchmarks, DeepMind Control and Meta-World, demonstrating the effectiveness of our approach.