CL AIMay 8, 2023

How Do In-Context Examples Affect Compositional Generalization?

Shengnan An, Zeqi Lin, Qiang Fu, Bei Chen, Nanning Zheng, Jian-Guang Lou, Dongmei Zhang

arXiv:2305.04835v328.3250 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of improving few-shot learning for compositional reasoning in AI, though it is incremental as it builds on existing in-context learning paradigms.

The paper tackles the problem of how in-context learning in large language models affects compositional generalization, finding that performance is highly sensitive to example selection, with key factors being structural similarity, diversity, and simplicity, and revealing limitations such as weaker generalization on fictional words.

Compositional generalization--understanding unseen combinations of seen primitives--is an essential reasoning capability in human intelligence. The AI community mainly studies this capability by fine-tuning neural networks on lots of training samples, while it is still unclear whether and how in-context learning--the prevailing few-shot paradigm based on large language models--exhibits compositional generalization. In this paper, we present CoFe, a test suite to investigate in-context compositional generalization. We find that the compositional generalization performance can be easily affected by the selection of in-context examples, thus raising the research question what the key factors are to make good in-context examples for compositional generalization. We study three potential factors: similarity, diversity and complexity. Our systematic experiments indicate that in-context examples should be structurally similar to the test case, diverse from each other, and individually simple. Furthermore, two strong limitations are observed: in-context compositional generalization on fictional words is much weaker than that on commonly used ones; it is still critical that the in-context examples should cover required linguistic structures, even though the backbone model has been pre-trained on large corpus. We hope our analysis would facilitate the understanding and utilization of in-context learning paradigm.

View on arXiv PDF

Similar