CL IR NAOct 11, 2022

Reflection of Thought: Inversely Eliciting Numerical Reasoning in Language Models via Solving Linear Systems

Fan Zhou, Haoyu Dong, Qian Liu, Zhoujun Cheng, Shi Han, Dongmei Zhang

arXiv:2210.05075v11.46 citationsh-index: 29

Originality Incremental advance

AI Analysis

This addresses the challenge of unreliable numerical generalization in language models, which is crucial for applications requiring precise quantitative reasoning, though it is an incremental advancement building on existing model capabilities.

The paper tackles the problem of language models struggling with numerical reasoning over a broad range of numbers by proposing a training-free method that uses simple anchor numbers to elicit arithmetic expressions from models and apply them to complex numbers, resulting in significant improvements on benchmarks across various models like GPT-3 and T5.

Numerical reasoning over natural language has been a long-standing goal for the research community. However, cutting-edge language models have proven difficult to reliably generalize to a broad range of numbers, although they have shown proficiency in reasoning over common and simple numbers. In this paper, we propose a novel method to elicit and exploit the numerical reasoning knowledge hidden in pre-trained language models using simple anchor numbers. Concretely, we first leverage simple numbers as anchors to probe the implicitly inferred arithmetic expressions from language models, and then explicitly apply the expressions on complex numbers to get corresponding answers. To inversely elicit arithmetic expressions, we transform and formulate the task as an analytically solvable linear system. Experimental results on several numerical reasoning benchmarks demonstrate that our approach significantly improves numerical reasoning capabilities of existing LMs. More importantly, our approach is training-free and simply works in the inference phase, making it highly portable and achieving consistent performance benefits across a variety of language models (GPT-3, T5, BART, etc) in all zero-shot, few-shot, and fine-tuning scenarios.

View on arXiv PDF

Similar