CVLGOct 17, 2022

SGRAM: Improving Scene Graph Parsing via Abstract Meaning Representation

arXiv:2210.08675v16 citationsh-index: 45
Originality Incremental advance
AI Analysis

This addresses the problem of generating structured semantic representations from text for applications like image retrieval, though it is incremental as it builds on existing parsing methods.

The paper tackles scene graph parsing from textual descriptions by using abstract meaning representation (AMR) instead of dependency parsing, resulting in a framework that outperforms dependency parsing-based models by 11.61% and previous state-of-the-art Transformer models by 3.78%.

Scene graph is structured semantic representation that can be modeled as a form of graph from images and texts. Image-based scene graph generation research has been actively conducted until recently, whereas text-based scene graph generation research has not. In this paper, we focus on the problem of scene graph parsing from textual description of a visual scene. The core idea is to use abstract meaning representation (AMR) instead of the dependency parsing mainly used in previous studies. AMR is a graph-based semantic formalism of natural language which abstracts concepts of words in a sentence contrary to the dependency parsing which considers dependency relationships on all words in a sentence. To this end, we design a simple yet effective two-stage scene graph parsing framework utilizing abstract meaning representation, SGRAM (Scene GRaph parsing via Abstract Meaning representation): 1) transforming a textual description of an image into an AMR graph (Text-to-AMR) and 2) encoding the AMR graph into a Transformer-based language model to generate a scene graph (AMR-to-SG). Experimental results show the scene graphs generated by our framework outperforms the dependency parsing-based model by 11.61\% and the previous state-of-the-art model using a pre-trained Transformer language model by 3.78\%. Furthermore, we apply SGRAM to image retrieval task which is one of downstream tasks for scene graph, and confirm the effectiveness of scene graphs generated by our framework.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes