Guided Policy Search for Parameterized Skills using Adverbs
This work addresses the challenge of sample-efficient policy learning in robotics or AI when dense rewards are unavailable, though it appears incremental as it adapts existing policy search methods with language feedback.
The paper tackles the problem of adjusting skill parameters using adverb phrases, enabling agents to update skill policies with human language feedback instead of dense environmental rewards. It demonstrates improved sample efficiency over modern policy search methods in two experiments.
We present a method for using adverb phrases to adjust skill parameters via learned adverb-skill groundings. These groundings allow an agent to use adverb feedback provided by a human to directly update a skill policy, in a manner similar to traditional local policy search methods. We show that our method can be used as a drop-in replacement for these policy search methods when dense reward from the environment is not available but human language feedback is. We demonstrate improved sample efficiency over modern policy search methods in two experiments.