CVAIFeb 3

Bongards at the Boundary of Perception and Reasoning: Programs or Language?

arXiv:2602.03038v1h-index: 3
AI Analysis

This addresses the problem of enabling AI systems to perform human-like visual reasoning in unfamiliar contexts, representing an incremental advancement in neurosymbolic approaches.

The paper tackles the challenge of solving Bongard problems, which test visual reasoning in novel situations, by proposing a neurosymbolic method that uses LLMs to generate programmatic rules and Bayesian optimization for parameter fitting, achieving results in classification and solving tasks.

Vision-Language Models (VLMs) have made great strides in everyday visual tasks, such as captioning a natural image, or answering commonsense questions about such images. But humans possess the puzzling ability to deploy their visual reasoning abilities in radically new situations, a skill rigorously tested by the classic set of visual reasoning challenges known as the Bongard problems. We present a neurosymbolic approach to solving these problems: given a hypothesized solution rule for a Bongard problem, we leverage LLMs to generate parameterized programmatic representations for the rule and perform parameter fitting using Bayesian optimization. We evaluate our method on classifying Bongard problem images given the ground truth rule, as well as on solving the problems from scratch.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes