CVHCLGApr 10, 2024

Gaze-Guided Graph Neural Network for Action Anticipation Conditioned on Intention

ETH Zurich
arXiv:2404.07347v19 citationsh-index: 6ETRA
Originality Incremental advance
AI Analysis

This work addresses video understanding for applications like robotics or surveillance by incorporating human gaze to enhance model performance, though it is incremental as it builds on existing graph neural network and gaze integration techniques.

The paper tackles action anticipation in videos by predicting an agent's actions from partial video, using a gaze-guided graph neural network to recognize intention and forecast action sequences. It achieves a 7% accuracy improvement over state-of-the-art methods for 18-class intention recognition on a dataset of household activities with human gaze data.

Humans utilize their gaze to concentrate on essential information while perceiving and interpreting intentions in videos. Incorporating human gaze into computational algorithms can significantly enhance model performance in video understanding tasks. In this work, we address a challenging and innovative task in video understanding: predicting the actions of an agent in a video based on a partial video. We introduce the Gaze-guided Action Anticipation algorithm, which establishes a visual-semantic graph from the video input. Our method utilizes a Graph Neural Network to recognize the agent's intention and predict the action sequence to fulfill this intention. To assess the efficiency of our approach, we collect a dataset containing household activities generated in the VirtualHome environment, accompanied by human gaze data of viewing videos. Our method outperforms state-of-the-art techniques, achieving a 7\% improvement in accuracy for 18-class intention recognition. This highlights the efficiency of our method in learning important features from human gaze data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes