CLSep 19, 2020

CLEVR Parser: A Graph Parser Library for Geometric Learning on Language Grounded Image Scenes

arXiv:2009.09154v2993 citations
AI Analysis

This provides a tool for the NLP and ML research communities to accelerate work in language-grounded image scenes, though it is incremental as it builds on existing datasets and methods.

The authors tackled the problem of language-grounded visual reasoning by developing a graph parser library for the CLEVR dataset, which extracts object-centric attributes and relationships to construct structural graph representations for dual modalities, enabling geometric learning and aiding in downstream tasks like language grounding and robotics.

The CLEVR dataset has been used extensively in language grounded visual reasoning in Machine Learning (ML) and Natural Language Processing (NLP) domains. We present a graph parser library for CLEVR, that provides functionalities for object-centric attributes and relationships extraction, and construction of structural graph representations for dual modalities. Structural order-invariant representations enable geometric learning and can aid in downstream tasks like language grounding to vision, robotics, compositionality, interpretability, and computational grammar construction. We provide three extensible main components - parser, embedder, and visualizer that can be tailored to suit specific learning setups. We also provide out-of-the-box functionality for seamless integration with popular deep graph neural network (GNN) libraries. Additionally, we discuss downstream usage and applications of the library, and how it accelerates research for the NLP research community.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes