LGCVDCJan 30

SQUAD: Scalable Quorum Adaptive Decisions via ensemble of early exit neural networks

arXiv:2601.22711v11 citationsh-index: 6
Originality Highly original
AI Analysis

This addresses the problem of inefficient and unreliable inference in machine learning systems, offering a scalable solution with significant performance gains, though it is incremental in building on early-exit and ensemble techniques.

The paper tackles the unreliability of single-model confidence thresholds in early-exit neural networks by introducing SQUAD, an inference scheme that integrates early-exit mechanisms with distributed ensemble learning, improving test accuracy by up to 5.95% and reducing inference latency by up to 70.60% compared to existing methods.

Early-exit neural networks have become popular for reducing inference latency by allowing intermediate predictions when sufficient confidence is achieved. However, standard approaches typically rely on single-model confidence thresholds, which are frequently unreliable due to inherent calibration issues. To address this, we introduce SQUAD (Scalable Quorum Adaptive Decisions), the first inference scheme that integrates early-exit mechanisms with distributed ensemble learning, improving uncertainty estimation while reducing the inference time. Unlike traditional methods that depend on individual confidence scores, SQUAD employs a quorum-based stopping criterion on early-exit learners by collecting intermediate predictions incrementally in order of computational complexity until a consensus is reached and halting the computation at that exit if the consensus is statistically significant. To maximize the efficacy of this voting mechanism, we also introduce QUEST (Quorum Search Technique), a Neural Architecture Search method to select early-exit learners with optimized hierarchical diversity, ensuring learners are complementary at every intermediate layer. This consensus-driven approach yields statistically robust early exits, improving the test accuracy up to 5.95% compared to state-of-the-art dynamic solutions with a comparable computational cost and reducing the inference latency up to 70.60% compared to static ensembles while maintaining a good accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes