CLDec 12, 2023

Can LLM find the green circle? Investigation and Human-guided tool manipulation for compositional generalization

arXiv:2312.07763v16 citationsh-index: 12ICASSP
Originality Highly original
AI Analysis

This addresses the problem of improving compositional generalization in LLMs for natural language processing tasks, offering a novel framework with significant performance gains.

The paper investigates the compositional generalization of large language models (LLMs) using in-context learning, finding they struggle with complex questions due to errors in reasoning and tool-making, and proposes a human-guided tool manipulation framework that achieves state-of-the-art performance, outperforming existing methods by 70% on the most challenging test split.

The meaning of complex phrases in natural language is composed of their individual components. The task of compositional generalization evaluates a model's ability to understand new combinations of components. Previous studies trained smaller, task-specific models, which exhibited poor generalization. While large language models (LLMs) exhibit impressive generalization abilities on many tasks through in-context learning (ICL), their potential for compositional generalization remains unexplored. In this paper, we first empirically investigate prevailing ICL methods in compositional generalization. We find that they struggle with complex compositional questions due to cumulative errors in long reasoning steps and intricate logic required for tool-making. Consequently, we propose a human-guided tool manipulation framework (HTM) that generates tools for sub-questions and integrates multiple tools. Our method enhances the effectiveness of tool creation and usage with minimal human effort. Experiments show that our method achieves state-of-the-art performance on two compositional generalization benchmarks and outperforms existing methods on the most challenging test split by 70%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes