LG CL MLOct 13, 2020

Ensemble Distillation for Structured Prediction: Calibrated, Accurate, Fast-Choose Three

Steven Reich, David Mueller, Nicholas Andrews

arXiv:2010.06721v27.913 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses calibration issues in structured prediction for NLP applications, offering a practical solution to reduce computational overhead while maintaining accuracy.

The paper tackles the problem of producing well-calibrated structured prediction models without the high inference cost of ensembles, using ensemble distillation. It shows that this method retains or improves performance and calibration on named-entity recognition and machine translation tasks.

Modern neural networks do not always produce well-calibrated predictions, even when trained with a proper scoring function such as cross-entropy. In classification settings, simple methods such as isotonic regression or temperature scaling may be used in conjunction with a held-out dataset to calibrate model outputs. However, extending these methods to structured prediction is not always straightforward or effective; furthermore, a held-out calibration set may not always be available. In this paper, we study ensemble distillation as a general framework for producing well-calibrated structured prediction models while avoiding the prohibitive inference-time cost of ensembles. We validate this framework on two tasks: named-entity recognition and machine translation. We find that, across both tasks, ensemble distillation produces models which retain much of, and occasionally improve upon, the performance and calibration benefits of ensembles, while only requiring a single model during test-time.

View on arXiv PDF Code

Similar