Language Acquisition is Embodied, Interactive, Emotive: a Research Proposal
This addresses the problem of developing more human-like language models for AI researchers, but it is a research proposal rather than an implementation.
The paper proposes that language acquisition requires embodiment, interaction, and emotion, unlike current text-only models, and sketches a model combining transformers and grounded learning for a robot-dialogue system.
Humans' experience of the world is profoundly multimodal from the beginning, so why do existing state-of-the-art language models only use text as a modality to learn and represent semantic meaning? In this paper we review the literature on the role of embodiment and emotion in the interactive setting of spoken dialogue as necessary prerequisites for language learning for human children, including how words in child vocabularies are largely concrete, then shift to become more abstract as the children get older. We sketch a model of semantics that leverages current transformer-based models and a word-level grounded model, then explain the robot-dialogue system that will make use of our semantic model, the setting for the system to learn language, and existing benchmarks for evaluation.