CVJul 16, 2024

Gated Temporal Diffusion for Stochastic Long-Term Dense Anticipation

arXiv:2407.11954v111 citationsh-index: 10Has Code
Originality Highly original
AI Analysis

This addresses the problem of predicting multiple future actions with uncertainty for applications like autonomous driving and human-robot interaction, representing a novel method rather than an incremental improvement.

The paper tackles long-term action anticipation under uncertainty by proposing a Gated Temporal Diffusion network that jointly models observation and future ambiguities, achieving state-of-the-art results on Breakfast, Assembly101, and 50Salads datasets in both stochastic and deterministic settings.

Long-term action anticipation has become an important task for many applications such as autonomous driving and human-robot interaction. Unlike short-term anticipation, predicting more actions into the future imposes a real challenge with the increasing uncertainty in longer horizons. While there has been a significant progress in predicting more actions into the future, most of the proposed methods address the task in a deterministic setup and ignore the underlying uncertainty. In this paper, we propose a novel Gated Temporal Diffusion (GTD) network that models the uncertainty of both the observation and the future predictions. As generator, we introduce a Gated Anticipation Network (GTAN) to model both observed and unobserved frames of a video in a mutual representation. On the one hand, using a mutual representation for past and future allows us to jointly model ambiguities in the observation and future, while on the other hand GTAN can by design treat the observed and unobserved parts differently and steer the information flow between them. Our model achieves state-of-the-art results on the Breakfast, Assembly101 and 50Salads datasets in both stochastic and deterministic settings. Code: https://github.com/olga-zats/GTDA .

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes