LGJul 18, 2024

Open-World Visual Reasoning by a Neuro-Symbolic Program of Zero-Shot Symbols

arXiv:2407.13382v12 citationsh-index: 19
Originality Incremental advance
AI Analysis

This work addresses the challenge of open-world visual reasoning for applications like robotic inspection, though it appears incremental as it combines existing neuro-symbolic and language-vision methods.

The paper tackles the problem of finding spatial configurations of multiple objects in images, such as locating abandoned tools on floors, by combining neuro-symbolic programming with language-vision models to match first-order logic formulas to object proposals. It demonstrates effectiveness in tasks like finding abandoned tools and leaking pipes, with most errors attributed to biases in the language-vision model.

We consider the problem of finding spatial configurations of multiple objects in images, e.g., a mobile inspection robot is tasked to localize abandoned tools on the floor. We define the spatial configuration of objects by first-order logic in terms of relations and attributes. A neuro-symbolic program matches the logic formulas to probabilistic object proposals for the given image, provided by language-vision models by querying them for the symbols. This work is the first to combine neuro-symbolic programming (reasoning) and language-vision models (learning) to find spatial configurations of objects in images in an open world setting. We show the effectiveness by finding abandoned tools on floors and leaking pipes. We find that most prediction errors are due to biases in the language-vision model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes