CLSep 10, 2021

Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers

arXiv:2109.04922v1663 citations
Originality Incremental advance
AI Analysis

This addresses the need for more informative evaluation in NLP, particularly for assessing coherence in text classifiers, but it is incremental as it builds on existing benchmarks.

The paper tackles the problem of evaluating text classifiers beyond accuracy by proposing a novel measure of prediction coherence, applied to two language understanding benchmarks to show it is quick, effective, and versatile.

As large-scale, pre-trained language models achieve human-level and superhuman accuracy on existing language understanding tasks, statistical bias in benchmark data and probing studies have recently called into question their true capabilities. For a more informative evaluation than accuracy on text classification tasks can offer, we propose evaluating systems through a novel measure of prediction coherence. We apply our framework to two existing language understanding benchmarks with different properties to demonstrate its versatility. Our experimental results show that this evaluation framework, although simple in ideas and implementation, is a quick, effective, and versatile measure to provide insight into the coherence of machines' predictions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes