AIJun 1, 2017

Grounding Symbols in Multi-Modal Instructions

Yordan Hristov, Svetlin Penkov, Alex Lascarides, Subramanian Ramamoorthy

arXiv:1706.00355v157.61089 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of enabling robots to understand rich, variable instructions from specific users in real-world settings, though it is incremental as it builds on existing multi-modal learning approaches.

The paper tackles the problem of grounding symbols in multi-modal instructions for robots in semi-structured environments, achieving the ability to learn a user's notion of color and shape from a small number of demonstrations and generalize to identifying physical referents for novel word combinations.

As robots begin to cohabit with humans in semi-structured environments, the need arises to understand instructions involving rich variability---for instance, learning to ground symbols in the physical world. Realistically, this task must cope with small datasets consisting of a particular users' contextual assignment of meaning to terms. We present a method for processing a raw stream of cross-modal input---i.e., linguistic instructions, visual perception of a scene and a concurrent trace of 3D eye tracking fixations---to produce the segmentation of objects with a correspondent association to high-level concepts. To test our framework we present experiments in a table-top object manipulation scenario. Our results show our model learns the user's notion of colour and shape from a small number of physical demonstrations, generalising to identifying physical referents for novel combinations of the words.

View on arXiv PDF

Similar