CVAILGSep 4, 2025

Not All Splits Are Equal: Rethinking Attribute Generalization Across Unrelated Categories

arXiv:2509.06998v1h-index: 15
Originality Incremental advance
AI Analysis

This work addresses the limitation of current models in abstracting attributes across conceptually distant categories, which is incremental as it builds on prior attribute prediction research by introducing new evaluation strategies.

The paper tackles the problem of whether models can generalize attribute knowledge across semantically and perceptually unrelated categories, such as identifying shared attributes like 'has four legs' between dogs and chairs, and finds that performance sharply drops as correlation between training and test categories decreases, with clustering methods offering the best trade-off.

Can models generalize attribute knowledge across semantically and perceptually dissimilar categories? While prior work has addressed attribute prediction within narrow taxonomic or visually similar domains, it remains unclear whether current models can abstract attributes and apply them to conceptually distant categories. This work presents the first explicit evaluation for the robustness of the attribute prediction task under such conditions, testing whether models can correctly infer shared attributes between unrelated object types: e.g., identifying that the attribute "has four legs" is common to both "dogs" and "chairs". To enable this evaluation, we introduce train-test split strategies that progressively reduce correlation between training and test sets, based on: LLM-driven semantic grouping, embedding similarity thresholding, embedding-based clustering, and supercategory-based partitioning using ground-truth labels. Results show a sharp drop in performance as the correlation between training and test categories decreases, indicating strong sensitivity to split design. Among the evaluated methods, clustering yields the most effective trade-off, reducing hidden correlations while preserving learnability. These findings offer new insights into the limitations of current representations and inform future benchmark construction for attribute reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes