AIJun 1, 2017

Grounding Symbols in Multi-Modal Instructions

arXiv:1706.00355v11089 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of enabling robots to understand rich, variable instructions from specific users in real-world settings, though it is incremental as it builds on existing multi-modal learning approaches.

The paper tackles the problem of grounding symbols in multi-modal instructions for robots in semi-structured environments, achieving the ability to learn a user's notion of color and shape from a small number of demonstrations and generalize to identifying physical referents for novel word combinations.

As robots begin to cohabit with humans in semi-structured environments, the need arises to understand instructions involving rich variability---for instance, learning to ground symbols in the physical world. Realistically, this task must cope with small datasets consisting of a particular users' contextual assignment of meaning to terms. We present a method for processing a raw stream of cross-modal input---i.e., linguistic instructions, visual perception of a scene and a concurrent trace of 3D eye tracking fixations---to produce the segmentation of objects with a correspondent association to high-level concepts. To test our framework we present experiments in a table-top object manipulation scenario. Our results show our model learns the user's notion of colour and shape from a small number of physical demonstrations, generalising to identifying physical referents for novel combinations of the words.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes