Towards Scene Graph Anticipation
This addresses the challenge of predicting future object interactions in dynamic scenes for applications like robotics and video understanding, though it is incremental as it adapts existing scene graph methods.
The paper tackles the problem of long-term anticipation of fine-grained pair-wise relationships between objects in videos, introducing the Scene Graph Anticipation task and proposing SceneSayer, which uses NeuralODE and NeuralSDE to model relationship evolution and achieves validated efficacy on the Action Genome dataset.
Spatio-temporal scene graphs represent interactions in a video by decomposing scenes into individual objects and their pair-wise temporal relationships. Long-term anticipation of the fine-grained pair-wise relationships between objects is a challenging problem. To this end, we introduce the task of Scene Graph Anticipation (SGA). We adapt state-of-the-art scene graph generation methods as baselines to anticipate future pair-wise relationships between objects and propose a novel approach SceneSayer. In SceneSayer, we leverage object-centric representations of relationships to reason about the observed video frames and model the evolution of relationships between objects. We take a continuous time perspective and model the latent dynamics of the evolution of object interactions using concepts of NeuralODE and NeuralSDE, respectively. We infer representations of future relationships by solving an Ordinary Differential Equation and a Stochastic Differential Equation, respectively. Extensive experimentation on the Action Genome dataset validates the efficacy of the proposed methods.