LGCVApr 19, 2024

QUTE: Quantifying Uncertainty in TinyML with Early-exit-assisted ensembles for model-monitoring

arXiv:2404.12599v23 citationsh-index: 3ICML
Originality Incremental advance
AI Analysis

This addresses the challenge of deploying uncertainty quantification on ultra-low-power, KB-sized TinyML devices, which is an incremental improvement over prior early-exit ensemble methods.

The paper tackles the problem of uncertainty quantification for tinyML models on resource-constrained devices by proposing QUTE, a resource-efficient early-exit-assisted ensemble architecture that reduces model size by 59% and latency by 31% on a microcontroller while maintaining superior uncertainty quality.

Uncertainty quantification (UQ) provides a resource-efficient solution for on-device monitoring of tinyML models deployed without access to true labels. However, existing UQ methods impose significant memory and compute demands, making them impractical for ultra-low-power, KB-sized TinyML devices. Prior work has attempted to reduce overhead by using early-exit ensembles to quantify uncertainty in a single forward pass, but these approaches still carry prohibitive costs. To address this, we propose QUTE, a novel resource-efficient early-exit-assisted ensemble architecture optimized for tinyML models. QUTE introduces additional output blocks at the final exit of the base network, distilling early-exit knowledge into these blocks to form a diverse yet lightweight ensemble. We show that QUTE delivers superior uncertainty quality on tiny models, achieving comparable performance on larger models with 59% smaller model sizes than the closest prior work. When deployed on a microcontroller, QUTE demonstrates a 31% reduction in latency on average. In addition, we show that QUTE excels at detecting accuracy-drop events, outperforming all prior works.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes