CLAIOct 11, 2022

Model Cascading: Towards Jointly Improving Efficiency and Accuracy of NLP Systems

Amazon
arXiv:2210.05528v1312 citationsh-index: 30
Originality Incremental advance
AI Analysis

This work addresses efficiency and accuracy challenges for NLP systems, potentially enabling broader real-world adoption, though it is incremental as it builds on existing cascading ideas.

The paper tackles the problem of computational inefficiency in NLP systems by proposing model cascading, a technique that uses models of varying capacities to improve efficiency and accuracy, achieving up to 88.93% computation cost savings and up to 2.18% accuracy improvement in experiments.

Do all instances need inference through the big models for a correct prediction? Perhaps not; some instances are easy and can be answered correctly by even small capacity models. This provides opportunities for improving the computational efficiency of systems. In this work, we present an explorative study on 'model cascading', a simple technique that utilizes a collection of models of varying capacities to accurately yet efficiently output predictions. Through comprehensive experiments in multiple task settings that differ in the number of models available for cascading (K value), we show that cascading improves both the computational efficiency and the prediction accuracy. For instance, in K=3 setting, cascading saves up to 88.93% computation cost and consistently achieves superior prediction accuracy with an improvement of up to 2.18%. We also study the impact of introducing additional models in the cascade and show that it further increases the efficiency improvements. Finally, we hope that our work will facilitate development of efficient NLP systems making their widespread adoption in real-world applications possible.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes