CLFeb 18, 2024

Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs

Siyuan Wang, Zhongyu Wei, Yejin Choi, Xiang Ren

arXiv:2402.11442v319.643 citationsh-index: 22Has CodeACL

Originality Incremental advance

AI Analysis

This addresses the problem of LLMs' limited logical reasoning for AI researchers, offering a method to enhance performance, though it is incremental in nature.

The authors investigated whether large language models (LLMs) can reason with inferential rules, revealing significant gaps in logic understanding compared to humans, especially for complex rules, and developed an inference engine that improved commonsense reasoning tasks.

Large language models (LLMs) have achieved impressive human-like performance across various reasoning tasks. However, their mastery of underlying inferential rules still falls short of human capabilities. To investigate this, we propose a logic scaffolding inferential rule generation framework, to construct an inferential rule base, ULogic, comprising both primitive and compositional rules across five domains. Our analysis of GPT-series models over a rule subset reveals significant gaps in LLMs' logic understanding compared to human performance, especially in compositional and structural complex rules with certain bias patterns. We further distill these rules into a smaller-scale inference engine for flexible rule generation and enhancing downstream reasoning. Through a multi-judger evaluation, our inference engine proves effective in generating accurate, complex and abstract conclusions and premises, and improve various commonsense reasoning tasks. Overall, our work sheds light on LLMs' limitations in grasping inferential rule and suggests ways to enhance their logical reasoning abilities~\footnote{Code and data are available at \url{https://github.com/SiyuanWangw/ULogic}.}.

View on arXiv PDF Code

Similar