Approximate Query Matching for Image Retrieval
This addresses the need for more holistic image retrieval in databases for users handling complex visual queries, though it is incremental as it builds on existing scene graph and graph database techniques.
The paper tackles the problem of retrieving images based on complex queries beyond single objects by using scene graphs stored in a graph database (Neo4J) for fast approximate matching, achieving retrieval of images with specified relations like 'girl eating cake' and variations.
Traditional image recognition involves identifying the key object in a portrait-type image with a single object focus (ILSVRC, AlexNet, and VGG). More recent approaches consider dense image recognition - segmenting an image with appropriate bounding boxes and performing image recognition within these bounding boxes (Semantic segmentation). The Visual Genome dataset [5] is an attempt to bridge these various approaches to a cohesive dataset for each subtask - bounding box generation, image recognition, captioning, and a new operation: scene graph generation. Our focus is on using such scene graphs to perform graph search on image databases to holistically retrieve images based on a search criteria. We develop a method to store scene graphs and metadata in graph databases (using Neo4J) and to perform fast approximate retrieval of images based on a graph search query. We process more complex queries than single object search, e.g. "girl eating cake" retrieves images that contain the specified relation as well as variations.