LGAINov 29, 2024

CAREL: Instruction-guided reinforcement learning with cross-modal auxiliary objectives

arXiv:2411.19787v21 citationsh-index: 20Has CodeTrans. Mach. Learn. Res.
AI Analysis

This addresses the challenge of improving generalization in automated reinforcement learning for language-guided tasks, though it appears incremental as it builds on existing video-text retrieval methods.

The paper tackles the problem of grounding instructions in language-guided goal-reaching reinforcement learning by proposing CAREL, a framework that uses cross-modal auxiliary objectives and instruction tracking. The results show superior sample efficiency and systematic generalization in multi-modal RL problems.

Grounding the instruction in the environment is a key step in solving language-guided goal-reaching reinforcement learning problems. In automated reinforcement learning, a key concern is to enhance the model's ability to generalize across various tasks and environments. In goal-reaching scenarios, the agent must comprehend the different parts of the instructions within the environmental context in order to complete the overall task successfully. In this work, we propose CAREL (Cross-modal Auxiliary REinforcement Learning) as a new framework to solve this problem using auxiliary loss functions inspired by video-text retrieval literature and a novel method called instruction tracking, which automatically keeps track of progress in an environment. The results of our experiments suggest superior sample efficiency and systematic generalization for this framework in multi-modal reinforcement learning problems. Our code base is available here.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes