Natural Language Satisfiability: Exploring the Problem Distribution and Evaluating Transformer-based Language Models
This work addresses the evaluation of reasoning in natural language for AI systems, but it is incremental as it builds on prior research by focusing on problem distribution and complexity.
The paper investigates how varying computational complexity classes and grammatical constructs affect transformer-based language models' ability to learn inference rules for natural language satisfiability, and conducts an empirical study to explore the problem distribution.
Efforts to apply transformer-based language models (TLMs) to the problem of reasoning in natural language have enjoyed ever-increasing success in recent years. The most fundamental task in this area to which nearly all others can be reduced is that of determining satisfiability. However, from a logical point of view, satisfiability problems vary along various dimensions, which may affect TLMs' ability to learn how to solve them. The problem instances of satisfiability in natural language can belong to different computational complexity classes depending on the language fragment in which they are expressed. Although prior research has explored the problem of natural language satisfiability, the above-mentioned point has not been discussed adequately. Hence, we investigate how problem instances from varying computational complexity classes and having different grammatical constructs impact TLMs' ability to learn rules of inference. Furthermore, to faithfully evaluate TLMs, we conduct an empirical study to explore the distribution of satisfiability problems.