SY LGFeb 23, 2012

Min Max Generalization for Two-stage Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes

Raphael Fonteneau, Damien Ernst, Bernard Boigelot, Quentin Louveaux

arXiv:1202.5298v25 citations

Originality Incremental advance

AI Analysis

This work addresses computational efficiency in batch mode reinforcement learning for deterministic settings, offering incremental improvements over existing methods.

The authors tackled the NP-hard minmax optimization problem for two-stage deterministic batch mode reinforcement learning by proposing two relaxation schemes, one dropping constraints for polynomial-time solvability and another using Lagrangian relaxation to form a conic quadratic program, both theoretically and empirically outperforming prior results.

We study the minmax optimization problem introduced in [22] for computing policies for batch mode reinforcement learning in a deterministic setting. First, we show that this problem is NP-hard. In the two-stage case, we provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, leads to a conic quadratic programming problem. We also theoretically prove and empirically illustrate that both relaxation schemes provide better results than those given in [22].

View on arXiv PDF

Similar