Generative Hybrid Representations for Activity Forecasting with No-Regret Learning
This work addresses the challenge of representing varied human behaviors for assistive systems, though it is incremental in combining continuous and discrete representations.
The paper tackles the problem of forecasting future human behaviors by developing a deep generative model that jointly predicts discrete actions and continuous motions, achieving high-quality and diverse samples with better generalization than related models on the EPIC-KITCHENS dataset.
Automatically reasoning about future human behaviors is a difficult problem but has significant practical applications to assistive systems. Part of this difficulty stems from learning systems' inability to represent all kinds of behaviors. Some behaviors, such as motion, are best described with continuous representations, whereas others, such as picking up a cup, are best described with discrete representations. Furthermore, human behavior is generally not fixed: people can change their habits and routines. This suggests these systems must be able to learn and adapt continuously. In this work, we develop an efficient deep generative model to jointly forecast a person's future discrete actions and continuous motions. On a large-scale egocentric dataset, EPIC-KITCHENS, we observe our method generates high-quality and diverse samples while exhibiting better generalization than related generative models. Finally, we propose a variant to continually learn our model from streaming data, observe its practical effectiveness, and theoretically justify its learning efficiency.