DCITLGNISPMay 17, 2025

Communication-Efficient Hybrid Language Model via Uncertainty-Aware Opportunistic and Compressed Transmission

arXiv:2505.11788v13 citationsh-index: 14
Originality Incremental advance
AI Analysis

This reduces communication costs for distributed language model applications, but it is incremental as it builds on existing hybrid architectures.

The paper tackles the high communication overhead in hybrid language models by proposing CU-HLM, which uses uncertainty-aware opportunistic transmission and compression, achieving up to 206× higher token throughput while maintaining 97.4% accuracy.

To support emerging language-based applications using dispersed and heterogeneous computing resources, the hybrid language model (HLM) offers a promising architecture, where an on-device small language model (SLM) generates draft tokens that are validated and corrected by a remote large language model (LLM). However, the original HLM suffers from substantial communication overhead, as the LLM requires the SLM to upload the full vocabulary distribution for each token. Moreover, both communication and computation resources are wasted when the LLM validates tokens that are highly likely to be accepted. To overcome these limitations, we propose communication-efficient and uncertainty-aware HLM (CU-HLM). In CU-HLM, the SLM transmits truncated vocabulary distributions only when its output uncertainty is high. We validate the feasibility of this opportunistic transmission by discovering a strong correlation between SLM's uncertainty and LLM's rejection probability. Furthermore, we theoretically derive optimal uncertainty thresholds and optimal vocabulary truncation strategies. Simulation results show that, compared to standard HLM, CU-HLM achieves up to 206$\times$ higher token throughput by skipping 74.8% transmissions with 97.4% vocabulary compression, while maintaining 97.4% accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes