CLFeb 10, 2025

Is a Peeled Apple Still Red? Evaluating LLMs' Ability for Conceptual Combination with Property Type

arXiv:2502.06086v211 citationsh-index: 6Has CodeNAACL
Originality Incremental advance
AI Analysis

This addresses a gap in evaluating LLMs' cognitive reasoning for AI researchers, though it appears incremental as it builds on existing conceptual combination studies with a new dataset and method.

The paper tackles the problem of evaluating large language models' ability to handle conceptual combination by introducing the CCPT dataset with 12.3K annotated triplets, finding that LLMs struggle to generate noun phrases with emergent properties and proposing a method that improves performance in generative tasks.

Conceptual combination is a cognitive process that merges basic concepts, enabling the creation of complex expressions. During this process, the properties of combination (e.g., the whiteness of a peeled apple) can be inherited from basic concepts, newly emerge, or be canceled. However, previous studies have evaluated a limited set of properties and have not examined the generative process. To address this gap, we introduce the Conceptual Combination with Property Type dataset (CCPT), which consists of 12.3K annotated triplets of noun phrases, properties, and property types. Using CCPT, we establish three types of tasks to evaluate LLMs for conceptual combination thoroughly. Our key findings are threefold: (1) Our automatic metric grading property emergence and cancellation closely corresponds with human judgments. (2) LLMs, including OpenAI's o1, struggle to generate noun phrases which possess given emergent properties. (3) Our proposed method, inspired by cognitive psychology model that explains how relationships between concepts are formed, improves performances in all generative tasks. The dataset and experimental code are available at https://github.com/seokwon99/CCPT.git.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes