Reliable Reasoning Beyond Natural Language
This addresses reasoning limitations in LLMs for AI applications, though it is incremental as it builds on existing neurosymbolic methods.
The paper tackles the problem of unreliable reasoning in Large Language Models (LLMs) by introducing a neurosymbolic approach that integrates Prolog for symbolic reasoning, achieving near-perfect accuracy on a new dataset and large performance gains on benchmarks like GSM8k and BIG-bench Navigate.
Despite their linguistic competence, Large Language Models (LLMs) often struggle to reason reliably and flexibly. To identify these shortcomings, we introduce the Non-Linear Reasoning (NLR) dataset, a collection of 55 unique, hand-designed problems that target reasoning bottlenecks arising from the sequential prediction paradigm of LLMs and the inherently linear nature of natural language. NLR tasks require iterative updates, backtracking, and reasoning across multiple parallel chains of thought but only basic arithmetic to solve. To address these limitations, we propose a neurosymbolic reasoning approach that integrates Prolog, a symbolic reasoning engine, into the inference pipeline of LLMs. This division of labor shifts the LLM's task from iterative computations to inferring all information, explicit or implied through common sense, and encoding it as logical code. Our method yields large and robust performance gains across the GSM8k and BIG-bench Navigate benchmarks and achieves near-perfect accuracy on NLR problems, maintaining robustness even as variable interdependence - the number of other variables on which the value of a single variable depends - increases.