AI CL LGMay 24, 2024

Understanding the Language Model to Solve the Symbolic Multi-Step Reasoning Problem from the Perspective of Buffer Mechanism

Zhiwei Wang, Yunji Wang, Zhongwang Zhang, Zhangchen Zhou, Hui Jin, Tianyang Hu, Jiacheng Sun, Zhenguo Li, Yaoyu Zhang, Zhi-Qin John Xu

arXiv:2405.15302v314.010 citationsh-index: 17EMNLP

Originality Incremental advance

AI Analysis

This work addresses the challenge of enhancing reasoning capabilities in language models for applications like mathematical problem-solving, though it is incremental as it builds on existing methods like Chain-of-Thought.

The study tackled the problem of large language models struggling with complex reasoning tasks by investigating their internal mechanisms using a symbolic multi-step reasoning task, resulting in a random matrix-based algorithm that introduced only 132 trainable parameters and achieved significant performance improvements on 7 datasets.

Large language models have consistently struggled with complex reasoning tasks, such as mathematical problem-solving. Investigating the internal reasoning mechanisms of these models can help us design better model architectures and training strategies, ultimately enhancing their reasoning capability. In this study, we constructed a symbolic multi-step reasoning task to investigate the information propagation mechanisms in Transformer models when solving the task through direct answering and Chain-of-Thought (CoT) reasoning. We introduced the concept of buffer mechanism: the model stores various information in distinct buffers and selectively extracts it through the query-key matrix. We proposed a random matrix-based algorithm to enhance the model's reasoning ability. This algorithm introduces only 132 trainable parameters, yet leads to significant performance improvements on 7 multi-step reasoning datasets, including PrOntoQA, LogicAsker, and LogicInference. These findings provide new insights into understanding the large language models.

View on arXiv PDF

Similar