CLLGMar 28, 2017

A Deep Compositional Framework for Human-like Language Acquisition in Virtual Environment

arXiv:1703.09831v326 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of human-like language acquisition for AI agents in virtual environments, though it is incremental as it builds on existing compositional learning approaches.

The paper tackles the problem of training an agent to learn language from scratch in a 2D maze environment, achieving zero-shot execution of commands with novel word combinations or object concepts, as demonstrated through compositional learning.

We tackle a task where an agent learns to navigate in a 2D maze-like environment called XWORLD. In each session, the agent perceives a sequence of raw-pixel frames, a natural language command issued by a teacher, and a set of rewards. The agent learns the teacher's language from scratch in a grounded and compositional manner, such that after training it is able to correctly execute zero-shot commands: 1) the combination of words in the command never appeared before, and/or 2) the command contains new object concepts that are learned from another task but never learned from navigation. Our deep framework for the agent is trained end to end: it learns simultaneously the visual representations of the environment, the syntax and semantics of the language, and the action module that outputs actions. The zero-shot learning capability of our framework results from its compositionality and modularity with parameter tying. We visualize the intermediate outputs of the framework, demonstrating that the agent truly understands how to solve the problem. We believe that our results provide some preliminary insights on how to train an agent with similar abilities in a 3D environment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes