CVAIAug 12, 2025

Separating Knowledge and Perception with Procedural Data

arXiv:2508.11697v1h-index: 7ICML
Originality Highly original
AI Analysis

This work addresses the challenge of compartmentalizing knowledge and perception in AI systems for computer vision, offering a novel approach with strong zero-shot capabilities, though it is incremental in improving visual memory methods.

The authors tackled the problem of training representation models using only procedural data and applying them to visual tasks without further training, achieving performance within 1% on NIGHTS visual similarity, outperforming by 8% and 15% on fine-grained classification, and within 10% on ImageNet-1K classification and COCO segmentation.

We train representation models with procedural data only, and apply them on visual similarity, classification, and semantic segmentation tasks without further training by using visual memory -- an explicit database of reference image embeddings. Unlike prior work on visual memory, our approach achieves full compartmentalization with respect to all real-world images while retaining strong performance. Compared to a model trained on Places, our procedural model performs within $1\%$ on NIGHTS visual similarity, outperforms by $8\%$ and $15\%$ on CUB200 and Flowers102 fine-grained classification, and is within $10\%$ on ImageNet-1K classification. It also demonstrates strong zero-shot segmentation, achieving an $R^2$ on COCO within $10\%$ of the models trained on real data. Finally, we analyze procedural versus real data models, showing that parts of the same object have dissimilar representations in procedural models, resulting in incorrect searches in memory and explaining the remaining performance gap.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes