LGJan 6, 2024

An Empirical Investigation of Value-Based Multi-objective Reinforcement Learning for Stochastic Environments

arXiv:2401.03163v12 citationsh-index: 27Knowledge engineering review (Print)
Originality Incremental advance
AI Analysis

This addresses a specific technical bottleneck in MORL for stochastic settings, but is incremental as it builds directly on prior research.

The paper investigated why value-based multi-objective reinforcement learning algorithms often fail to learn optimal policies in stochastic environments, finding that noisy Q-value estimates critically undermine stability and convergence.

One common approach to solve multi-objective reinforcement learning (MORL) problems is to extend conventional Q-learning by using vector Q-values in combination with a utility function. However issues can arise with this approach in the context of stochastic environments, particularly when optimising for the Scalarised Expected Reward (SER) criterion. This paper extends prior research, providing a detailed examination of the factors influencing the frequency with which value-based MORL Q-learning algorithms learn the SER-optimal policy for an environment with stochastic state transitions. We empirically examine several variations of the core multi-objective Q-learning algorithm as well as reward engineering approaches, and demonstrate the limitations of these methods. In particular, we highlight the critical impact of the noisy Q-value estimates issue on the stability and convergence of these algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes