Virtual Embodiment: A Scalable Long-Term Strategy for Artificial Intelligence Research
This addresses the problem of fragmented, single-modality AI research for achieving human-like semantic understanding, though it is an incremental step in strategy rather than a breakthrough.
The paper tackles the challenge of developing sophisticated natural language semantics in AI by proposing virtual embodiment as a long-term, multi-modal strategy to ground meaning in sensori-motor experience, aiming to achieve human-level intelligence.
Meaning has been called the "holy grail" of a variety of scientific disciplines, ranging from linguistics to philosophy, psychology and the neurosciences. The field of Artifical Intelligence (AI) is very much a part of that list: the development of sophisticated natural language semantics is a sine qua non for achieving a level of intelligence comparable to humans. Embodiment theories in cognitive science hold that human semantic representation depends on sensori-motor experience; the abundant evidence that human meaning representation is grounded in the perception of physical reality leads to the conclusion that meaning must depend on a fusion of multiple (perceptual) modalities. Despite this, AI research in general, and its subdisciplines such as computational linguistics and computer vision in particular, have focused primarily on tasks that involve a single modality. Here, we propose virtual embodiment as an alternative, long-term strategy for AI research that is multi-modal in nature and that allows for the kind of scalability required to develop the field coherently and incrementally, in an ethically responsible fashion.