CLJul 25, 2017

The RepEval 2017 Shared Task: Multi-Genre Natural Language Inference with Sentence Representations

arXiv:1707.08172v11120 citations
Originality Synthesis-oriented
AI Analysis

This shared task benchmarks sentence representation learning for multi-genre natural language inference, providing a competitive evaluation for researchers in NLP.

The RepEval 2017 Shared Task evaluated neural network models on the MultiNLI corpus for natural language inference, with all five teams surpassing baseline accuracies and the best model achieving 74.5% accuracy on the genre-matched test set.

This paper presents the results of the RepEval 2017 Shared Task, which evaluated neural network sentence representation learning models on the Multi-Genre Natural Language Inference corpus (MultiNLI) recently introduced by Williams et al. (2017). All of the five participating teams beat the bidirectional LSTM (BiLSTM) and continuous bag of words baselines reported in Williams et al.. The best single model used stacked BiLSTMs with residual connections to extract sentence features and reached 74.5% accuracy on the genre-matched test set. Surprisingly, the results of the competition were fairly consistent across the genre-matched and genre-mismatched test sets, and across subsets of the test data representing a variety of linguistic phenomena, suggesting that all of the submitted systems learned reasonably domain-independent representations for sentence meaning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes