LG MLAug 23, 2018

LIFT: Reinforcement Learning in Computer Systems by Learning From Demonstrations

Michael Schaarschmidt, Alexander Kuhnle, Ben Ellis, Kai Fricke, Felix Gessert, Eiko Yoneki

arXiv:1808.07903v110.841 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses practical barriers for data management practitioners by providing a tool to leverage reinforcement learning with reduced training overhead, though it is incremental as it builds on existing RL methods with a focus on demonstrations.

The authors tackled the challenge of applying deep reinforcement learning to data management tasks by introducing LIFT, an end-to-end software stack that learns from human demonstrations to reduce training times. Results showed that LIFT controllers outperformed human baselines and heuristics by up to 70% in latency metrics and space usage in case studies on database indexing and resource management.

Reinforcement learning approaches have long appealed to the data management community due to their ability to learn to control dynamic behavior from raw system performance. Recent successes in combining deep neural networks with reinforcement learning have sparked significant new interest in this domain. However, practical solutions remain elusive due to large training data requirements, algorithmic instability, and lack of standard tools. In this work, we introduce LIFT, an end-to-end software stack for applying deep reinforcement learning to data management tasks. While prior work has frequently explored applications in simulations, LIFT centers on utilizing human expertise to learn from demonstrations, thus lowering online training times. We further introduce TensorForce, a TensorFlow library for applied deep reinforcement learning exposing a unified declarative interface to common RL algorithms, thus providing a backend to LIFT. We demonstrate the utility of LIFT in two case studies in database compound indexing and resource management in stream processing. Results show LIFT controllers initialized from demonstrations can outperform human baselines and heuristics across latency metrics and space usage by up to 70%.

View on arXiv PDF Code

Similar