CL AIMar 15

Evaluating Adjective-Noun Compositionality in LLMs: Functional vs Representational Perspectives

Ruchira Dhar, Qiwei Peng, Anders Søgaard

arXiv:2603.0999488.6h-index: 4

Predicted impact top 32% in CL · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the challenge of understanding compositional abilities in LLMs for researchers and practitioners, highlighting a divergence between internal representations and task performance, which is incremental in emphasizing contrastive evaluation methods.

The paper tackled the problem of evaluating adjective-noun compositionality in large language models (LLMs) by comparing prompt-based functional assessments with representational analyses of internal states, finding that LLMs develop compositional representations but fail to consistently translate them into functional task success across model variants.

Compositionality is considered central to language abilities. As performant language systems, how do large language models (LLMs) do on compositional tasks? We evaluate adjective-noun compositionality in LLMs using two complementary setups: prompt-based functional assessment and a representational analysis of internal model states. Our results reveal a striking divergence between task performance and internal states. While LLMs reliably develop compositional representations, they fail to translate consistently into functional task success across model variants. Consequently, we highlight the importance of contrastive evaluation for obtaining a more complete understanding of model capabilities.

View on arXiv PDF

Similar