CVOct 17, 2025

Neuro-Symbolic Spatial Reasoning in Segmentation

arXiv:2510.15841v13.6h-index: 7

Originality Highly original

AI Analysis

This work solves the problem of improving segmentation accuracy for unseen objects in OVSS, particularly in complex scenes with multiple categories, representing a novel method for a known bottleneck.

The paper tackled the problem of Open-Vocabulary Semantic Segmentation (OVSS) by introducing neuro-symbolic spatial reasoning to address the lack of spatial relational understanding in vision-language models, achieving state-of-the-art performance with an average mIoU across four benchmark datasets and clear advantages on multi-category images.

Open-Vocabulary Semantic Segmentation (OVSS) assigns pixel-level labels from an open set of categories, requiring generalization to unseen and unlabelled objects. Using vision-language models (VLMs) to correlate local image patches with potential unseen object categories suffers from a lack of understanding of spatial relations of objects in a scene. To solve this problem, we introduce neuro-symbolic (NeSy) spatial reasoning in OVSS. In contrast to contemporary VLM correlation-based approaches, we propose Relational Segmentor (RelateSeg) to impose explicit spatial relational constraints by first order logic (FOL) formulated in a neural network architecture. This is the first attempt to explore NeSy spatial reasoning in OVSS. Specifically, RelateSeg automatically extracts spatial relations, e.g., <cat, to-right-of, person>, and encodes them as first-order logic formulas using our proposed pseudo categories. Each pixel learns to predict both a semantic category (e.g., "cat") and a spatial pseudo category (e.g., "right of person") simultaneously, enforcing relational constraints (e.g., a "cat" pixel must lie to the right of a "person"). Finally, these logic constraints are formulated in a deep network architecture by fuzzy logic relaxation, enabling end-to-end learning of spatial-relationally consistent segmentation. RelateSeg achieves state-of-the-art performance in terms of average mIoU across four benchmark datasets and particularly shows clear advantages on images containing multiple categories, with the cost of only introducing a single auxiliary loss function and no additional parameters, validating the effectiveness of NeSy spatial reasoning in OVSS.

View on arXiv PDF

Similar