CLOct 6, 2020

Scene Graph Modification Based on Natural Language Commands

arXiv:2010.02591v1994 citations
Originality Incremental advance
AI Analysis

This addresses the need for multi-turn user interfaces to control structured representations, though it is incremental as it builds on existing graph and transformer methods.

The paper tackles the problem of directly updating scene graphs based on natural language commands, a novel task in NLP, and introduces models that outperform previous adapted systems.

Structured representations like graphs and parse trees play a crucial role in many Natural Language Processing systems. In recent years, the advancements in multi-turn user interfaces necessitate the need for controlling and updating these structured representations given new sources of information. Although there have been many efforts focusing on improving the performance of the parsers that map text to graphs or parse trees, very few have explored the problem of directly manipulating these representations. In this paper, we explore the novel problem of graph modification, where the systems need to learn how to update an existing scene graph given a new user's command. Our novel models based on graph-based sparse transformer and cross attention information fusion outperform previous systems adapted from the machine translation and graph generation literature. We further contribute our large graph modification datasets to the research community to encourage future research for this new problem.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes