LGFeb 20, 2023

Progressive Ensemble Distillation: Building Ensembles for Efficient Inference

arXiv:2302.10093v2h-index: 20
AI Analysis

This addresses the need for efficient inference in resource-constrained environments like on-device applications, representing an incremental improvement in model compression and ensemble methods.

The paper tackles the problem of decomposing a large pretrained teacher model into smaller student models for efficient on-device inference, achieving ensembles that maintain similar performance to the teacher while allowing flexible tuning of accuracy versus cost at runtime.

We study the problem of progressive ensemble distillation: Given a large, pretrained teacher model $g$, we seek to decompose the model into smaller, low-inference cost student models $f_i$, such that progressively evaluating additional models in this ensemble leads to improved predictions. The resulting ensemble allows for flexibly tuning accuracy vs. inference cost at runtime, which is useful for a number of applications in on-device inference. The method we propose, B-DISTIL , relies on an algorithmic procedure that uses function composition over intermediate activations to construct expressive ensembles with similar performance as $g$ , but with smaller student models. We demonstrate the effectiveness of B-DISTIL by decomposing pretrained models across standard image, speech, and sensor datasets. We also provide theoretical guarantees in terms of convergence and generalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes