CLNov 30, 2023

Evaluating the Rationale Understanding of Critical Reasoning in Logical Reading Comprehension

arXiv:2311.18353v1134 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This work addresses the need for better evaluation of language models' critical reasoning abilities, particularly in explaining why incorrect alternatives are eliminated, which is incremental as it builds on existing datasets.

The authors tackled the problem of evaluating language models' logical reading comprehension by creating a dataset with rationale texts for multiple-choice questions, showing that models like InstructGPT struggle with subquestions, especially for incorrect options, with performance dropping significantly in these cases.

To precisely evaluate a language model's capability for logical reading comprehension, we present a dataset for testing the understanding of the rationale behind critical reasoning. For questions taken from an existing multiplechoice logical reading comprehension dataset, we crowdsource rationale texts that explain why we should select or eliminate answer options, resulting in 3,003 multiple-choice subquestions that are associated with 943 main questions. Experiments on our dataset show that recent large language models (e.g., InstructGPT) struggle to answer the subquestions even if they are able to answer the main questions correctly. We find that the models perform particularly poorly in answering subquestions written for the incorrect options of the main questions, implying that the models have a limited capability for explaining why incorrect alternatives should be eliminated. These results suggest that our dataset encourages further investigation into the critical reasoning ability of language models while focusing on the elimination process of relevant alternatives.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes