Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning
This addresses the problem of enabling AI agents to acquire language skills indirectly, which could reduce reliance on large labeled datasets, though it is incremental as it builds on existing meta-RL methods in a specific domain.
The study investigated whether embodied reinforcement learning agents can indirectly learn language from non-language tasks, similar to human children, and found that agents trained with meta-RL algorithms successfully generalized to reading floor plans with new layouts and language phrases to navigate to correct offices without direct language supervision.
Whereas machine learning models typically learn language by directly training on language tasks (e.g., next-word prediction), language emerges in human children as a byproduct of solving non-language tasks (e.g., acquiring food). Motivated by this observation, we ask: can embodied reinforcement learning (RL) agents also indirectly learn language from non-language tasks? Learning to associate language with its meaning requires a dynamic environment with varied language. Therefore, we investigate this question in a multi-task environment with language that varies across the different tasks. Specifically, we design an office navigation environment, where the agent's goal is to find a particular office, and office locations differ in different buildings (i.e., tasks). Each building includes a floor plan with a simple language description of the goal office's location, which can be visually read as an RGB image when visited. We find RL agents indeed are able to indirectly learn language. Agents trained with current meta-RL algorithms successfully generalize to reading floor plans with held-out layouts and language phrases, and quickly navigate to the correct office, despite receiving no direct language supervision.