LG MLMar 12, 2019

Learning Gaussian Policies from Corrective Human Feedback

Daan Wout, Jan Scholten, Carlos Celemin, Jens Kober

arXiv:1903.05216v11.81 citations

Originality Highly original

AI Analysis

This work addresses the challenge of scalable and intuitive policy learning from human feedback for continuous control systems, representing an incremental advancement over existing methods.

The paper tackles the problem of learning policies from corrective human feedback in continuous control tasks by introducing Gaussian Process Coach (GPC), which avoids feature engineering and uses policy uncertainty for feedback selection and adaptive learning. The result shows that GPC outperforms the state-of-the-art COACH in final performance, convergence rate, and robustness to errors in benchmarks, with improvements demonstrated for both simulated and real human teachers.

Learning from human feedback is a viable alternative to control design that does not require modelling or control expertise. Particularly, learning from corrective advice garners advantages over evaluative feedback as it is a more intuitive and scalable format. The current state-of-the-art in this field, COACH, has proven to be a effective approach for confined problems. However, it parameterizes the policy with Radial Basis Function networks, which require meticulous feature space engineering for higher order systems. We introduce Gaussian Process Coach (GPC), where feature space engineering is avoided by employing Gaussian Processes. In addition, we use the available policy uncertainty to 1) inquire feedback samples of maximal utility and 2) to adapt the learning rate to the teacher's learning phase. We demonstrate that the novel algorithm outperforms the current state-of-the-art in final performance, convergence rate and robustness to erroneous feedback in OpenAI Gym continuous control benchmarks, both for simulated and real human teachers.

View on arXiv PDF

Similar