How Humans and LLMs Organize Conceptual Knowledge: Exploring Subordinate Categories in Italian
This work addresses the challenge of using AI-generated data for psychological and linguistic research, but it is incremental as it builds on prior studies focusing on basic-level categories.
The study tackled the problem of understanding how humans and LLMs organize conceptual knowledge at subordinate levels, such as 'grizzly bear', by creating a new Italian dataset and evaluating LLMs on tasks like exemplar generation; the result showed low alignment between humans and LLMs, with performance varying across semantic domains.
People can categorize the same entity at multiple taxonomic levels, such as basic (bear), superordinate (animal), and subordinate (grizzly bear). While prior research has focused on basic-level categories, this study is the first attempt to examine the organization of categories by analyzing exemplars produced at the subordinate level. We present a new Italian psycholinguistic dataset of human-generated exemplars for 187 concrete words. We then use these data to evaluate whether textual and vision LLMs produce meaningful exemplars that align with human category organization across three key tasks: exemplar generation, category induction, and typicality judgment. Our findings show a low alignment between humans and LLMs, consistent with previous studies. However, their performance varies notably across different semantic domains. Ultimately, this study highlights both the promises and the constraints of using AI-generated exemplars to support psychological and linguistic research.