CLCCLGApr 21, 2021

Sensitivity as a Complexity Measure for Sequence Classification Tasks

arXiv:2104.10343v1654 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of task complexity prediction for researchers in NLP, offering a theoretical tool to explain why some tasks are harder, though it is incremental as it builds on existing sensitivity theory.

The paper tackles the problem of understanding and predicting the complexity of sequence classification tasks by introducing a theoretical framework based on Boolean function sensitivity, showing that tasks requiring high sensitivity are more difficult and that sensitivity predicts model performance on 15 NLP tasks, with higher sensitivity on challenging GLUE tasks.

We introduce a theoretical framework for understanding and predicting the complexity of sequence classification tasks, using a novel extension of the theory of Boolean function sensitivity. The sensitivity of a function, given a distribution over input sequences, quantifies the number of disjoint subsets of the input sequence that can each be individually changed to change the output. We argue that standard sequence classification methods are biased towards learning low-sensitivity functions, so that tasks requiring high sensitivity are more difficult. To that end, we show analytically that simple lexical classifiers can only express functions of bounded sensitivity, and we show empirically that low-sensitivity functions are easier to learn for LSTMs. We then estimate sensitivity on 15 NLP tasks, finding that sensitivity is higher on challenging tasks collected in GLUE than on simple text classification tasks, and that sensitivity predicts the performance both of simple lexical classifiers and of vanilla BiLSTMs without pretrained contextualized embeddings. Within a task, sensitivity predicts which inputs are hard for such simple models. Our results suggest that the success of massively pretrained contextual representations stems in part because they provide representations from which information can be extracted by low-sensitivity decoders.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes