Ground-Compose-Reinforce: Grounding Language in Agentic Behaviours using Limited Data
This addresses the problem of building language-grounded agents for human or agent interaction with limited data, representing an incremental improvement by exploiting compositionality in Reward Machines.
The paper tackles the challenge of grounding language in perception and action for situated agents by proposing Ground-Compose-Reinforce, a neurosymbolic framework that trains RL agents from high-level task specifications without manual reward design or large datasets, achieving complex behaviors from only 350 labeled pretraining trajectories where non-compositional methods fail.
Grounding language in perception and action is a key challenge when building situated agents that can interact with humans, or other agents, via language. In the past, addressing this challenge has required manually designing the language grounding or curating massive datasets that associate language with the environment. We propose Ground-Compose-Reinforce, an end-to-end, neurosymbolic framework for training RL agents directly from high-level task specifications--without manually designed reward functions or other domain-specific oracles, and without massive datasets. These task specifications take the form of Reward Machines, automata-based representations that capture high-level task structure and are in some cases autoformalizable from natural language. Critically, we show that Reward Machines can be grounded using limited data by exploiting compositionality. Experiments in a custom Meta-World domain with only 350 labelled pretraining trajectories show that our framework faithfully elicits complex behaviours from high-level specifications--including behaviours that never appear in pretraining--while non-compositional approaches fail.