AICLNov 1, 2022

Learning to Solve Voxel Building Embodied Tasks from Pixels and Natural Language Instructions

MicrosoftMIT
arXiv:2211.00688v112 citationsh-index: 26
Originality Incremental advance
AI Analysis

This addresses the challenge of verifying action feasibility and relevance for embodied AI agents in simulated environments, though it is incremental as it builds on existing pre-trained models and RL techniques.

The paper tackles the problem of generating feasible action plans for embodied agents in voxel building tasks from natural language instructions by combining a language model with reinforcement learning to produce achievable sub-goals and complete sub-tasks, achieving baseline performance in the IGLU 2022 competition.

The adoption of pre-trained language models to generate action plans for embodied agents is a promising research strategy. However, execution of instructions in real or simulated environments requires verification of the feasibility of actions as well as their relevance to the completion of a goal. We propose a new method that combines a language model and reinforcement learning for the task of building objects in a Minecraft-like environment according to the natural language instructions. Our method first generates a set of consistently achievable sub-goals from the instructions and then completes associated sub-tasks with a pre-trained RL policy. The proposed method formed the RL baseline at the IGLU 2022 competition.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes