CLAILOMay 2, 2020

RICA: Evaluating Robust Inference Capabilities Based on Commonsense Axioms

arXiv:2005.00782v4691 citations
Originality Incremental advance
AI Analysis

This work addresses a critical gap in AI for human-AI communication by exposing limitations in PTLMs' commonsense reasoning, though it is incremental as it focuses on evaluation rather than solving the problem.

The authors tackled the problem of evaluating robust commonsense inference in pre-trained language models (PTLMs) by proposing a new benchmark called RICA, which tests inference capabilities under textual perturbations, and found that PTLMs perform no better than random guessing in zero-shot settings and are not robust to attacks.

Pre-trained language models (PTLMs) have achieved impressive performance on commonsense inference benchmarks, but their ability to employ commonsense to make robust inferences, which is crucial for effective communications with humans, is debated. In the pursuit of advancing fluid human-AI communication, we propose a new challenge, RICA: Robust Inference capability based on Commonsense Axioms, that evaluates robust commonsense inference despite textual perturbations. To generate data for this challenge, we develop a systematic and scalable procedure using commonsense knowledge bases and probe PTLMs across two different evaluation settings. Extensive experiments on our generated probe sets with more than 10k statements show that PTLMs perform no better than random guessing on the zero-shot setting, are heavily impacted by statistical biases, and are not robust to perturbation attacks. We also find that fine-tuning on similar statements offer limited gains, as PTLMs still fail to generalize to unseen inferences. Our new large-scale benchmark exposes a significant gap between PTLMs and human-level language understanding and offers a new challenge for PTLMs to demonstrate commonsense.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes