CLAIMay 2, 2019

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

arXiv:1905.00537v32786 citations
AI Analysis

This provides a new, stickier benchmark for researchers and developers in natural language processing to assess and advance language understanding models.

The authors introduced SuperGLUE, a more challenging benchmark for evaluating general-purpose language understanding systems, as existing benchmarks like GLUE had been surpassed by models, limiting research headroom.

In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced a little over one year ago, offers a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently surpassed the level of non-expert humans, suggesting limited headroom for further research. In this paper we present SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard. SuperGLUE is available at super.gluebenchmark.com.

Code Implementations6 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes