HC ROFeb 4, 2019

Exploring Temporal Dependencies in Multimodal Referring Expressions with Mixed Reality

Elena Sibirtseva, Ali Ghadirzadeh, Iolanda Leite, Mårten Björkman, Danica Kragic

arXiv:1902.01117v17.64 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of natural communication in collaborative human-robot tasks, but appears incremental as it builds on existing multimodal approaches.

The paper tackled the problem of disambiguating multimodal referring expressions in human-robot interaction by modeling temporal dependencies across head movements, hand gestures, and speech, and found that using a temporal prior increased the model's predictive power.

In collaborative tasks, people rely both on verbal and non-verbal cues simultaneously to communicate with each other. For human-robot interaction to run smoothly and naturally, a robot should be equipped with the ability to robustly disambiguate referring expressions. In this work, we propose a model that can disambiguate multimodal fetching requests using modalities such as head movements, hand gestures, and speech. We analysed the acquired data from mixed reality experiments and formulated a hypothesis that modelling temporal dependencies of events in these three modalities increases the model's predictive power. We evaluated our model on a Bayesian framework to interpret referring expressions with and without exploiting a temporal prior.

View on arXiv PDF

Similar