ROAIMay 12, 2024

Learning Reward for Robot Skills Using Large Language Models via Self-Alignment

arXiv:2405.07162v327 citationsh-index: 3ICML
Originality Incremental advance
AI Analysis

This work addresses the bottleneck of reward function learning for robots, enabling broader skill repertoires without human input, though it is incremental as it builds on existing LLM-based approaches.

The paper tackles the problem of learning reward functions for robot skills by using Large Language Models (LLMs) to propose features and parameters, then refining them through self-alignment to minimize ranking inconsistencies. The method was validated on 9 tasks across 2 simulation environments, showing consistent improvements in training efficacy and efficiency while using significantly fewer GPT tokens compared to an alternative method.

Learning reward functions remains the bottleneck to equip a robot with a broad repertoire of skills. Large Language Models (LLM) contain valuable task-related knowledge that can potentially aid in the learning of reward functions. However, the proposed reward function can be imprecise, thus ineffective which requires to be further grounded with environment information. We proposed a method to learn rewards more efficiently in the absence of humans. Our approach consists of two components: We first use the LLM to propose features and parameterization of the reward, then update the parameters through an iterative self-alignment process. In particular, the process minimizes the ranking inconsistency between the LLM and the learnt reward functions based on the execution feedback. The method was validated on 9 tasks across 2 simulation environments. It demonstrates a consistent improvement over training efficacy and efficiency, meanwhile consuming significantly fewer GPT tokens compared to the alternative mutation-based method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes