CVOct 17, 2025

Neuro-Symbolic Spatial Reasoning in Segmentation

arXiv:2510.15841v1h-index: 7
Originality Highly original
AI Analysis

This work solves the problem of improving segmentation accuracy for unseen objects in OVSS, particularly in complex scenes with multiple categories, representing a novel method for a known bottleneck.

The paper tackled the problem of Open-Vocabulary Semantic Segmentation (OVSS) by introducing neuro-symbolic spatial reasoning to address the lack of spatial relational understanding in vision-language models, achieving state-of-the-art performance with an average mIoU across four benchmark datasets and clear advantages on multi-category images.

Open-Vocabulary Semantic Segmentation (OVSS) assigns pixel-level labels from an open set of categories, requiring generalization to unseen and unlabelled objects. Using vision-language models (VLMs) to correlate local image patches with potential unseen object categories suffers from a lack of understanding of spatial relations of objects in a scene. To solve this problem, we introduce neuro-symbolic (NeSy) spatial reasoning in OVSS. In contrast to contemporary VLM correlation-based approaches, we propose Relational Segmentor (RelateSeg) to impose explicit spatial relational constraints by first order logic (FOL) formulated in a neural network architecture. This is the first attempt to explore NeSy spatial reasoning in OVSS. Specifically, RelateSeg automatically extracts spatial relations, e.g., <cat, to-right-of, person>, and encodes them as first-order logic formulas using our proposed pseudo categories. Each pixel learns to predict both a semantic category (e.g., "cat") and a spatial pseudo category (e.g., "right of person") simultaneously, enforcing relational constraints (e.g., a "cat" pixel must lie to the right of a "person"). Finally, these logic constraints are formulated in a deep network architecture by fuzzy logic relaxation, enabling end-to-end learning of spatial-relationally consistent segmentation. RelateSeg achieves state-of-the-art performance in terms of average mIoU across four benchmark datasets and particularly shows clear advantages on images containing multiple categories, with the cost of only introducing a single auxiliary loss function and no additional parameters, validating the effectiveness of NeSy spatial reasoning in OVSS.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes