LGAINESYMay 20, 2017

Batch Reinforcement Learning on the Industrial Benchmark: First Experiences

arXiv:1705.07262v212 citations
Originality Synthesis-oriented
AI Analysis

This work demonstrates the feasibility of PSO-P for real-world industrial reinforcement learning applications, though it is incremental as it extends an existing method to a new benchmark.

The paper applied the Particle Swarm Optimization Policy (PSO-P) to the Industrial Benchmark, a realistic reinforcement learning benchmark with continuous spaces and stochasticity, and found that PSO-P outperformed established methods like RCNN and NFQ, yielding the best-performing policy with high robustness and low parameter tuning effort.

The Particle Swarm Optimization Policy (PSO-P) has been recently introduced and proven to produce remarkable results on interacting with academic reinforcement learning benchmarks in an off-policy, batch-based setting. To further investigate the properties and feasibility on real-world applications, this paper investigates PSO-P on the so-called Industrial Benchmark (IB), a novel reinforcement learning (RL) benchmark that aims at being realistic by including a variety of aspects found in industrial applications, like continuous state and action spaces, a high dimensional, partially observable state space, delayed effects, and complex stochasticity. The experimental results of PSO-P on IB are compared to results of closed-form control policies derived from the model-based Recurrent Control Neural Network (RCNN) and the model-free Neural Fitted Q-Iteration (NFQ). Experiments show that PSO-P is not only of interest for academic benchmarks, but also for real-world industrial applications, since it also yielded the best performing policy in our IB setting. Compared to other well established RL techniques, PSO-P produced outstanding results in performance and robustness, requiring only a relatively low amount of effort in finding adequate parameters or making complex design decisions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes