Learning when to skim and when to read
This work addresses efficiency issues in NLP models for practitioners, but it is incremental as it builds on existing methods for computational reduction.
The paper tackled the problem of reducing computational cost in deep learning for NLP by using a fast weak classifier and a strong slow model, achieving significant efficiency gains in sentiment classification with both a probability-threshold method and a secondary decision network.
Many recent advances in deep learning for natural language processing have come at increasing computational cost, but the power of these state-of-the-art models is not needed for every example in a dataset. We demonstrate two approaches to reducing unnecessary computation in cases where a fast but weak baseline classier and a stronger, slower model are both available. Applying an AUC-based metric to the task of sentiment classification, we find significant efficiency gains with both a probability-threshold method for reducing computational cost and one that uses a secondary decision network.