CLApr 18, 2021

Language in a (Search) Box: Grounding Language Learning in Real-World Human-Machine Interaction

arXiv:2104.08874v1727 citations
Originality Highly original
AI Analysis

This addresses the challenge of developing more human-like language understanding systems by leveraging natural interaction data, though it is incremental in applying grounding to existing search engine contexts.

The paper tackles the problem of learning grounded language semantics from real-world human-machine interactions, specifically using search engine data, and demonstrates that their approach achieves better compositional generalization and zero-shot inference than state-of-the-art non-grounded models like word2vec and BERT.

We investigate grounded language learning through real-world data, by modelling a teacher-learner dynamics through the natural interactions occurring between users and search engines; in particular, we explore the emergence of semantic generalization from unsupervised dense representations outside of synthetic environments. A grounding domain, a denotation function and a composition function are learned from user data only. We show how the resulting semantics for noun phrases exhibits compositional properties while being fully learnable without any explicit labelling. We benchmark our grounded semantics on compositionality and zero-shot inference tasks, and we show that it provides better results and better generalizations than SOTA non-grounded models, such as word2vec and BERT.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes