AICLLGMay 24, 2024

Understanding the Language Model to Solve the Symbolic Multi-Step Reasoning Problem from the Perspective of Buffer Mechanism

arXiv:2405.15302v39 citationsh-index: 17EMNLP
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing reasoning capabilities in language models for applications like mathematical problem-solving, though it is incremental as it builds on existing methods like Chain-of-Thought.

The study tackled the problem of large language models struggling with complex reasoning tasks by investigating their internal mechanisms using a symbolic multi-step reasoning task, resulting in a random matrix-based algorithm that introduced only 132 trainable parameters and achieved significant performance improvements on 7 datasets.

Large language models have consistently struggled with complex reasoning tasks, such as mathematical problem-solving. Investigating the internal reasoning mechanisms of these models can help us design better model architectures and training strategies, ultimately enhancing their reasoning capability. In this study, we constructed a symbolic multi-step reasoning task to investigate the information propagation mechanisms in Transformer models when solving the task through direct answering and Chain-of-Thought (CoT) reasoning. We introduced the concept of buffer mechanism: the model stores various information in distinct buffers and selectively extracts it through the query-key matrix. We proposed a random matrix-based algorithm to enhance the model's reasoning ability. This algorithm introduces only 132 trainable parameters, yet leads to significant performance improvements on 7 multi-step reasoning datasets, including PrOntoQA, LogicAsker, and LogicInference. These findings provide new insights into understanding the large language models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes