AICLLGROMay 22, 2018

Guided Feature Transformation (GFT): A Neural Language Grounding Module for Embodied Agents

arXiv:1805.08329v221 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of language grounding for embodied agents in AI, offering a novel module that enhances task performance in 3D environments, though it is incremental as it builds on existing deep reinforcement learning approaches.

The paper tackles the problem of training embodied agents to perform language-directed tasks in virtual environments by proposing a neural language grounding module that transforms visual features using sentence embeddings, achieving significant performance improvements over state-of-the-art methods in navigation tasks with partial observability.

Recently there has been a rising interest in training agents, embodied in virtual environments, to perform language-directed tasks by deep reinforcement learning. In this paper, we propose a simple but effective neural language grounding module for embodied agents that can be trained end to end from scratch taking raw pixels, unstructured linguistic commands, and sparse rewards as the inputs. We model the language grounding process as a language-guided transformation of visual features, where latent sentence embeddings are used as the transformation matrices. In several language-directed navigation tasks that feature challenging partial observability and require simple reasoning, our module significantly outperforms the state of the art. We also release XWorld3D, an easy-to-customize 3D environment that can potentially be modified to evaluate a variety of embodied agents.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes