CLLGMar 12, 2020

Learning word-referent mappings and concepts from raw inputs

arXiv:2003.05573v17 citations
AI Analysis

This addresses how children might learn language-world correspondences, but it is incremental as it builds on prior cross-situational learning models by handling raw inputs.

The paper tackles the problem of learning word-referent mappings from noisy, ambiguous naturalistic inputs by developing a neural network model that operates on raw images and words, achieving generalization to novel instances and demonstrating behaviors like mutual exclusivity.

How do children learn correspondences between the language and the world from noisy, ambiguous, naturalistic input? One hypothesis is via cross-situational learning: tracking words and their possible referents across multiple situations allows learners to disambiguate correct word-referent mappings (Yu & Smith, 2007). However, previous models of cross-situational word learning operate on highly simplified representations, side-stepping two important aspects of the actual learning problem. First, how can word-referent mappings be learned from raw inputs such as images? Second, how can these learned mappings generalize to novel instances of a known word? In this paper, we present a neural network model trained from scratch via self-supervision that takes in raw images and words as inputs, and show that it can learn word-referent mappings from fully ambiguous scenes and utterances through cross-situational learning. In addition, the model generalizes to novel word instances, locates referents of words in a scene, and shows a preference for mutual exclusivity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes