CLAIOct 12, 2024

Transformer-based Language Models for Reasoning in the Description Logic ALCQ

arXiv:2410.09613v11 citationsh-index: 47Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the need for more challenging benchmarks to assess logical reasoning in AI, though it is incremental as it builds on existing transformer models and datasets.

The paper tackled the problem of evaluating transformer-based language models on complex logical reasoning by constructing a dataset DELTA_D using the expressive description logic ALCQ, with 384K examples, and showed that a fine-tuned DeBERTa model mastered entailment checking while GPT models improved significantly with few-shot prompting (e.g., 9 shots).

Recent advancements in transformer-based language models have sparked research into their logical reasoning capabilities. Most of the benchmarks used to evaluate these models are simple: generated from short (fragments of) first-order logic sentences with only a few logical operators and quantifiers. We construct the natural language dataset, DELTA$_D$, using the expressive description logic language $\mathcal{ALCQ}$. DELTA$_D$ comprises 384K examples and increases in two dimensions: i) reasoning depth, and ii) linguistic complexity. In this way, we systematically investigate the logical reasoning capabilities of a supervised fine-tuned DeBERTa-based model and two large language models (GPT-3.5, GPT-4) with few-shot prompting. We show that the DeBERTa-based model fine-tuned on our dataset can master the entailment checking task. Moreover, the performance of GPTs can improve significantly even when a small number of samples is provided (9 shots). We open-source our code and datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes