LGFeb 13, 2025

Convex Is Back: Solving Belief MDPs With Convexity-Informed Deep Reinforcement Learning

arXiv:2502.09298v2h-index: 2Has Code
Originality Incremental advance
AI Analysis

This work addresses performance and robustness issues in DRL for POMDPs, which is a domain-specific problem for reinforcement learning researchers, and is incremental as it builds on existing DRL methods by adding convexity constraints.

The paper tackled improving Deep Reinforcement Learning (DRL) for Partially Observable Markov Decision Processes (POMDPs) by incorporating convexity of the value function over belief space, introducing hard- and soft-enforced convexity approaches. Results showed substantial performance increases and enhanced robustness, especially in out-of-distribution domains, as tested on Tiger and FieldVisionRockSample environments.

We present a novel method for Deep Reinforcement Learning (DRL), incorporating the convex property of the value function over the belief space in Partially Observable Markov Decision Processes (POMDPs). We introduce hard- and soft-enforced convexity as two different approaches, and compare their performance against standard DRL on two well-known POMDP environments, namely the Tiger and FieldVisionRockSample problems. Our findings show that including the convexity feature can substantially increase performance of the agents, as well as increase robustness over the hyperparameter space, especially when testing on out-of-distribution domains. The source code for this work can be found at https://github.com/Dakout/Convex_DRL.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes