LGCLMLOct 13, 2020

Ensemble Distillation for Structured Prediction: Calibrated, Accurate, Fast-Choose Three

arXiv:2010.06721v213 citations
Originality Incremental advance
AI Analysis

This addresses calibration issues in structured prediction for NLP applications, offering a practical solution to reduce computational overhead while maintaining accuracy.

The paper tackles the problem of producing well-calibrated structured prediction models without the high inference cost of ensembles, using ensemble distillation. It shows that this method retains or improves performance and calibration on named-entity recognition and machine translation tasks.

Modern neural networks do not always produce well-calibrated predictions, even when trained with a proper scoring function such as cross-entropy. In classification settings, simple methods such as isotonic regression or temperature scaling may be used in conjunction with a held-out dataset to calibrate model outputs. However, extending these methods to structured prediction is not always straightforward or effective; furthermore, a held-out calibration set may not always be available. In this paper, we study ensemble distillation as a general framework for producing well-calibrated structured prediction models while avoiding the prohibitive inference-time cost of ensembles. We validate this framework on two tasks: named-entity recognition and machine translation. We find that, across both tasks, ensemble distillation produces models which retain much of, and occasionally improve upon, the performance and calibration benefits of ensembles, while only requiring a single model during test-time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes