Deep Sequence Learning for Video Anticipation: From Discrete and Deterministic to Continuous and Stochastic
This work addresses the challenge of predicting ambiguous future representations from limited observations in video analysis, representing incremental progress in the field.
The research tackles video anticipation by advancing from predicting coarse, deterministic future representations to forecasting continuous, fine-grained stochastic ones, with applications in action anticipation and human motion forecasting.
Video anticipation is the task of predicting one/multiple future representation(s) given limited, partial observation. This is a challenging task due to the fact that given limited observation, the future representation can be highly ambiguous. Based on the nature of the task, video anticipation can be considered from two viewpoints: the level of details and the level of determinism in the predicted future. In this research, we start from anticipating a coarse representation of a deterministic future and then move towards predicting continuous and fine-grained future representations of a stochastic process. The example of the former is video action anticipation in which we are interested in predicting one action label given a partially observed video and the example of the latter is forecasting multiple diverse continuations of human motion given partially observed one. In particular, in this thesis, we make several contributions to the literature of video anticipation...