CVMar 18, 2024

MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control

arXiv:2403.12037v254 citationsh-index: 18
Originality Incremental advance
AI Analysis

This addresses the challenge of instruction-following for AI agents in open-world simulations, representing an incremental improvement using existing methods.

The paper tackles the problem of creating generalist embodied agents that can follow abstract sequential instructions in simulated environments like Minecraft, achieving nearly double the performance of the best baseline agent.

It is a long-lasting goal to design a generalist-embodied agent that can follow diverse instructions in human-like ways. However, existing approaches often fail to steadily follow instructions due to difficulties in understanding abstract and sequential natural language instructions. To this end, we introduce MineDreamer, an open-ended embodied agent built upon the challenging Minecraft simulator with an innovative paradigm that enhances instruction-following ability in low-level control signal generation. Specifically, MineDreamer is developed on top of recent advances in Multimodal Large Language Models (MLLMs) and diffusion models, and we employ a Chain-of-Imagination (CoI) mechanism to envision the step-by-step process of executing instructions and translating imaginations into more precise visual prompts tailored to the current state; subsequently, the agent generates keyboard-and-mouse actions to efficiently achieve these imaginations, steadily following the instructions at each step. Extensive experiments demonstrate that MineDreamer follows single and multi-step instructions steadily, significantly outperforming the best generalist agent baseline and nearly doubling its performance. Moreover, qualitative analysis of the agent's imaginative ability reveals its generalization and comprehension of the open world.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes