Knowledge-Guided Short-Context Action Anticipation in Human-Centric Videos
This work addresses the problem of improving video editing workflows and narrative suggestions for creative professionals, though it is incremental as it builds on existing transformer and knowledge graph methods.
The paper tackles long-term human action anticipation from short video segments by integrating a symbolic knowledge graph into a transformer network, achieving up to 9% improvement over state-of-the-art methods on Breakfast and 50Salads datasets.
This work focuses on anticipating long-term human actions, particularly using short video segments, which can speed up editing workflows through improved suggestions while fostering creativity by suggesting narratives. To this end, we imbue a transformer network with a symbolic knowledge graph for action anticipation in video segments by boosting certain aspects of the transformer's attention mechanism at run-time. Demonstrated on two benchmark datasets, Breakfast and 50Salads, our approach outperforms current state-of-the-art methods for long-term action anticipation using short video context by up to 9%.