Disentangled Sequence Clustering for Human Intention Inference
This addresses the challenge of enabling robots to collaborate effectively with humans by inferring intent in an unsupervised way, though it is incremental as it builds on existing unsupervised learning methods.
The paper tackles the problem of inferring human intent in human-robot interaction without requiring task-specific labels, proposing the DiSCVAE framework that learns intent distributions unsupervisedly and shows on a robotic wheelchair dataset that the inferred intent aligns with actual human intent.
Equipping robots with the ability to infer human intent is a vital precondition for effective collaboration. Most computational approaches towards this objective derive a probability distribution of "intent" conditioned on the robot's perceived state. However, these approaches typically assume task-specific labels of human intent are known a priori. To overcome this constraint, we propose the Disentangled Sequence Clustering Variational Autoencoder (DiSCVAE), a clustering framework capable of learning such a distribution of intent in an unsupervised manner. The proposed framework leverages recent advances in unsupervised learning to disentangle latent representations of sequence data, separating time-varying local features from time-invariant global attributes. As a novel extension, the DiSCVAE also infers a discrete variable to form a latent mixture model and thus enable clustering over these global sequence concepts, e.g. high-level intentions. We evaluate the DiSCVAE on a real-world human-robot interaction dataset collected using a robotic wheelchair. Our findings reveal that the inferred discrete variable coincides with human intent, holding promise for collaborative settings, such as shared control.