CL AIMar 6, 2023

Towards Zero-Shot Functional Compositionality of Language Models

Hangyeol Yu, Myeongho Jeong, Jamin Shin, Hyeongdon Moon, Juneyoung Park, Seungtaek Choi

CMU

arXiv:2303.03103v10.92 citationsh-index: 18Has Code

Originality Synthesis-oriented

AI Analysis

This work highlights a fundamental limitation in AI that affects the development of more human-like language models, though it is incremental as it primarily critiques existing paradigms and proposes future research rather than presenting a new solution.

The paper identifies the lack of functional compositionality in large pre-trained language models as a critical open problem, showing that current models like GPT-2 and T5 fail to achieve human-level generalizability in tasks like cross-lingual summarization, and suggests research directions to address this gap.

Large Pre-trained Language Models (PLM) have become the most desirable starting point in the field of NLP, as they have become remarkably good at solving many individual tasks. Despite such success, in this paper, we argue that current paradigms of working with PLMs are neglecting a critical aspect of modeling human intelligence: functional compositionality. Functional compositionality - the ability to compose learned tasks - has been a long-standing challenge in the field of AI (and many other fields) as it is considered one of the hallmarks of human intelligence. An illustrative example of such is cross-lingual summarization, where a bilingual person (English-French) could directly summarize an English document into French sentences without having to translate the English document or summary into French explicitly. We discuss why this matter is an important open problem that requires further attention from the field. Then, we show that current PLMs (e.g., GPT-2 and T5) don't have functional compositionality yet and it is far from human-level generalizability. Finally, we suggest several research directions that could push the field towards zero-shot functional compositionality of language models.

View on arXiv PDF Code

Similar