LG SPJan 21

Communication-Efficient Multi-Modal Edge Inference via Uncertainty-Aware Distributed Learning

Hang Zhao, Hongru Li, Dongfang Xu, Shenghui Song, Khaled B. Letaief

arXiv:2601.14942v11.4h-index: 7

Originality Incremental advance

AI Analysis

This addresses communication bottlenecks for distributed edge intelligence systems, particularly in bandwidth-limited wireless environments, though it appears incremental as an optimization of existing distributed learning approaches.

The paper tackles the challenge of communication-efficient and robust multi-modal edge inference over wireless links by proposing a three-stage distributed learning framework that reduces training communication rounds by 40% while achieving 92.3% accuracy on RGB-depth scene classification, outperforming existing baselines.

Semantic communication is emerging as a key enabler for distributed edge intelligence due to its capability to convey task-relevant meaning. However, achieving communication-efficient training and robust inference over wireless links remains challenging. This challenge is further exacerbated for multi-modal edge inference (MMEI) by two factors: 1) prohibitive communication overhead for distributed learning over bandwidth-limited wireless links, due to the \emph{multi-modal} nature of the system; and 2) limited robustness under varying channels and noisy multi-modal inputs. In this paper, we propose a three-stage communication-aware distributed learning framework to improve training and inference efficiency while maintaining robustness over wireless channels. In Stage~I, devices perform local multi-modal self-supervised learning to obtain shared and modality-specific encoders without device--server exchange, thereby reducing the communication cost. In Stage~II, distributed fine-tuning with centralized evidential fusion calibrates per-modality uncertainty and reliably aggregates features distorted by noise or channel fading. In Stage~III, an uncertainty-guided feedback mechanism selectively requests additional features for uncertain samples, optimizing the communication--accuracy tradeoff in the distributed setting. Experiments on RGB--depth indoor scene classification show that the proposed framework attains higher accuracy with far fewer training communication rounds and remains robust to modality degradation or channel variation, outperforming existing self-supervised and fully supervised baselines.

View on arXiv PDF

Similar