Finding Islands of Predictability in Action Forecasting
This work addresses the problem of predicting long-duration future action sequences for applications like video analysis, but it appears incremental as it builds on existing methods with a novel combination.
The paper tackles dense action forecasting by modeling future action sequences with variable levels of abstraction, dynamically selected during prediction, and shows that this approach maintains fine-grained predictions while improving accuracy, resulting in substantial, monotonic increases in accuracy.
We address dense action forecasting: the problem of predicting future action sequence over long durations based on partial observation. Our key insight is that future action sequences are more accurately modeled with variable, rather than one, levels of abstraction, and that the optimal level of abstraction can be dynamically selected during the prediction process. Our experiments show that most parts of future action sequences can be predicted confidently in fine detail only in small segments of future frames, which are effectively ``islands'' of high model prediction confidence in a ``sea'' of uncertainty. We propose a combination Bayesian neural network and hierarchical convolutional segmentation model to both accurately predict future actions and optimally select abstraction levels. We evaluate this approach on standard datasets against existing state-of-the-art systems and demonstrate that our ``islands of predictability'' approach maintains fine-grained action predictions while also making accurate abstract predictions where systems were previously unable to do so, and thus results in substantial, monotonic increases in accuracy.