CLJan 11, 2019

EQUATE: A Benchmark Evaluation Framework for Quantitative Reasoning in Natural Language Inference

arXiv:1901.03735v21032 citations
Originality Incremental advance
AI Analysis

This work addresses the need for better evaluation of quantitative reasoning in NLP systems, which is crucial for developing more intelligent language understanding models, though it is incremental as it focuses on benchmarking and baseline improvements.

The authors tackled the problem of evaluating quantitative reasoning in natural language inference by introducing EQUATE, a benchmark framework, and found that state-of-the-art NLI models did not improve over a majority-class baseline, indicating a lack of implicit quantitative reasoning. They established a new baseline Q-REAS, which achieved a +24.2% improvement on numerical reasoning tests but had -8.1% lower verbal reasoning capabilities.

Quantitative reasoning is a higher-order reasoning skill that any intelligent natural language understanding system can reasonably be expected to handle. We present EQUATE (Evaluating Quantitative Understanding Aptitude in Textual Entailment), a new framework for quantitative reasoning in textual entailment. We benchmark the performance of 9 published NLI models on EQUATE, and find that on average, state-of-the-art methods do not achieve an absolute improvement over a majority-class baseline, suggesting that they do not implicitly learn to reason with quantities. We establish a new baseline Q-REAS that manipulates quantities symbolically. In comparison to the best performing NLI model, it achieves success on numerical reasoning tests (+24.2%), but has limited verbal reasoning capabilities (-8.1%). We hope our evaluation framework will support the development of models of quantitative reasoning in language understanding.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes