CLAILGSep 13, 2019

Scene Graph Parsing by Attention Graph

arXiv:1909.06273v112 citations
Originality Incremental advance
AI Analysis

This work addresses scene graph parsing for vision and language applications, representing an incremental improvement over existing methods.

The paper tackles the problem of automatically generating scene graphs from images by introducing an 'Attention Graph' mechanism that can be trained end-to-end and integrated into a Transformer model, achieving an F-score similarity of 52.21% on the SPICE metric, which surpasses previous approaches by 2.5%.

Scene graph representations, which form a graph of visual object nodes together with their attributes and relations, have proved useful across a variety of vision and language applications. Recent work in the area has used Natural Language Processing dependency tree methods to automatically build scene graphs. In this work, we present an 'Attention Graph' mechanism that can be trained end-to-end, and produces a scene graph structure that can be lifted directly from the top layer of a standard Transformer model. The scene graphs generated by our model achieve an F-score similarity of 52.21% to ground-truth graphs on the evaluation set using the SPICE metric, surpassing the best previous approaches by 2.5%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes