CLAIDec 18, 2023

Assessing Logical Reasoning Capabilities of Encoder-Only Transformer Models

arXiv:2312.11720v24 citationsh-index: 13NeSy
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of assessing logical reasoning in AI models for researchers and practitioners, but it is incremental as it builds on existing methods to evaluate model capabilities.

The paper investigated whether encoder-only transformer language models can perform logical reasoning by training them to determine logical validity on datasets, finding they achieve reasonable success but struggle to transfer this ability across datasets, suggesting they may learn dataset-specific features rather than general logical capabilities.

Logical reasoning is central to complex human activities, such as thinking, debating, and planning; it is also a central component of many AI systems as well. In this paper, we investigate the extent to which encoder-only transformer language models (LMs) can reason according to logical rules. We ask whether those LMs can deduce theorems in propositional calculus and first-order logic; if their relative success in these problems reflects general logical capabilities; and which layers contribute the most to the task. First, we show for several encoder-only LMs that they can be trained, to a reasonable degree, to determine logical validity on various datasets. Next, by cross-probing fine-tuned models on these datasets, we show that LMs have difficulty in transferring their putative logical reasoning ability, which suggests that they may have learned dataset-specific features, instead of a general capability. Finally, we conduct a layerwise probing experiment, which shows that the hypothesis classification task is mostly solved through higher layers.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes