CLAILGSep 27, 2024

Individuation in Neural Models with and without Visual Grounding

arXiv:2409.18868v122 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses the problem of how neural models encode individuation, which is important for researchers in AI, linguistics, and cognitive science, but it is incremental as it builds on existing models and theories.

The study compared CLIP, a language-and-vision model, with text-only models like FastText and SBERT, finding that CLIP embeddings better capture quantitative differences in individuation information, such as for substrates and object numbers, and align with linguistic and cognitive science hierarchies.

We show differences between a language-and-vision model CLIP and two text-only models - FastText and SBERT - when it comes to the encoding of individuation information. We study latent representations that CLIP provides for substrates, granular aggregates, and various numbers of objects. We demonstrate that CLIP embeddings capture quantitative differences in individuation better than models trained on text-only data. Moreover, the individuation hierarchy we deduce from the CLIP embeddings agrees with the hierarchies proposed in linguistics and cognitive science.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes