AI HC ROAug 12, 2023

Latent Emission-Augmented Perspective-Taking (LEAPT) for Human-Robot Interaction

Kaiqi Chen, Jing Yu Lim, Kingsley Kuan, Harold Soh

arXiv:2308.06498v13.91 citationsh-index: 26

Originality Incremental advance

AI Analysis

This addresses the challenge of uncertainty in perspective-taking for human-robot interaction, though it appears incremental as it builds on probabilistic graphical models and deep learning methods.

The paper tackled the problem of enabling robots to perform perspective-taking in partially-observable human-robot interactions by proposing a deep world model with a decomposed multi-modal latent state space, and it significantly outperformed existing baselines in predicting human observations and beliefs on three tasks.

Perspective-taking is the ability to perceive or understand a situation or concept from another individual's point of view, and is crucial in daily human interactions. Enabling robots to perform perspective-taking remains an unsolved problem; existing approaches that use deterministic or handcrafted methods are unable to accurately account for uncertainty in partially-observable settings. This work proposes to address this limitation via a deep world model that enables a robot to perform both perception and conceptual perspective taking, i.e., the robot is able to infer what a human sees and believes. The key innovation is a decomposed multi-modal latent state space model able to generate and augment fictitious observations/emissions. Optimizing the ELBO that arises from this probabilistic graphical model enables the learning of uncertainty in latent space, which facilitates uncertainty estimation from high-dimensional observations. We tasked our model to predict human observations and beliefs on three partially-observable HRI tasks. Experiments show that our method significantly outperforms existing baselines and is able to infer visual observations available to other agent and their internal beliefs.

View on arXiv PDF

Similar