SYLGFeb 23, 2012

Min Max Generalization for Two-stage Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes

arXiv:1202.5298v25 citations
Originality Incremental advance
AI Analysis

This work addresses computational efficiency in batch mode reinforcement learning for deterministic settings, offering incremental improvements over existing methods.

The authors tackled the NP-hard minmax optimization problem for two-stage deterministic batch mode reinforcement learning by proposing two relaxation schemes, one dropping constraints for polynomial-time solvability and another using Lagrangian relaxation to form a conic quadratic program, both theoretically and empirically outperforming prior results.

We study the minmax optimization problem introduced in [22] for computing policies for batch mode reinforcement learning in a deterministic setting. First, we show that this problem is NP-hard. In the two-stage case, we provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, leads to a conic quadratic programming problem. We also theoretically prove and empirically illustrate that both relaxation schemes provide better results than those given in [22].

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes