Investigating More Explainable and Partition-Free Compositionality Estimation for LLMs: A Rule-Generation Perspective
For researchers evaluating LLM compositionality, this provides a more explainable and partition-free estimation method, though it is an incremental improvement over existing tests.
The paper proposes a rule-generation perspective for estimating compositionality in LLMs, addressing limitations of existing compositional generalization tests (lack of explainability and combination leakage). Experiments on a string-to-grid task reveal compositionality deficiencies in current LLMs.
Compositional generalization tests are often used to estimate the compositionality of LLMs. However, such tests have the following limitations: (1) they only focus on the output results without considering LLMs' understanding of sample compositionality, resulting in explainability defects; (2) they rely on dataset partition to form the test set with combinations unseen in the training set, suffering from combination leakage issues. In this work, we propose a novel rule-generation perspective for compositionality estimation for LLMs. It requires LLMs to generate a program as rules for dataset mapping and provides estimates of the compositionality of LLMs using complexity-based theory. The perspective addresses the limitations of compositional generalization tests and provides a new way to analyze the compositionality characterization of LLMs. We conduct experiments and analysis of existing advanced LLMs based on this perspective on a string-to-grid task, and find various compositionality characterizations and compositionality deficiencies exhibited by LLMs.