Image-Level Attentional Context Modeling Using Nested-Graph Neural Networks
This work addresses scene graph generation for computer vision applications, offering an incremental improvement by focusing on image-level context over object-centric approaches.
The paper tackles scene graph generation by introducing image-level attentional context modeling (ILAC), which uses a nested-graph neural network to propagate contextual information, achieving competitive performance on the Visual Genome dataset with fewer parameters.
We introduce a new scene graph generation method called image-level attentional context modeling (ILAC). Our model includes an attentional graph network that effectively propagates contextual information across the graph using image-level features. Whereas previous works use an object-centric context, we build an image-level context agent to encode the scene properties. The proposed method comprises a single-stream network that iteratively refines the scene graph with a nested graph neural network. We demonstrate that our approach achieves competitive performance with the state-of-the-art for scene graph generation on the Visual Genome dataset, while requiring fewer parameters than other methods. We also show that ILAC can improve regular object detectors by incorporating relational image-level information.