CLJul 7, 2025

Put Teacher in Student's Shoes: Cross-Distillation for Ultra-compact Model Compression Framework

arXiv:2507.04636v110 citationsh-index: 9KDD
Originality Highly original
AI Analysis

This work addresses the need for ultra-compact models for NLP tasks in mobile and edge computing environments with strict privacy and real-time requirements, representing a strong specific gain in model compression.

The paper tackles the challenge of deploying efficient NLP models in resource-restricted edge settings by introducing the Edge ultra-lIte BERT framework (EI-BERT), which achieves a remarkably compact BERT-based model of only 1.91 MB, the smallest to date for NLU tasks, and has been deployed in real-world applications like Alipay's recommendation system serving 8.4 million daily active devices.

In the era of mobile computing, deploying efficient Natural Language Processing (NLP) models in resource-restricted edge settings presents significant challenges, particularly in environments requiring strict privacy compliance, real-time responsiveness, and diverse multi-tasking capabilities. These challenges create a fundamental need for ultra-compact models that maintain strong performance across various NLP tasks while adhering to stringent memory constraints. To this end, we introduce Edge ultra-lIte BERT framework (EI-BERT) with a novel cross-distillation method. EI-BERT efficiently compresses models through a comprehensive pipeline including hard token pruning, cross-distillation and parameter quantization. Specifically, the cross-distillation method uniquely positions the teacher model to understand the student model's perspective, ensuring efficient knowledge transfer through parameter integration and the mutual interplay between models. Through extensive experiments, we achieve a remarkably compact BERT-based model of only 1.91 MB - the smallest to date for Natural Language Understanding (NLU) tasks. This ultra-compact model has been successfully deployed across multiple scenarios within the Alipay ecosystem, demonstrating significant improvements in real-world applications. For example, it has been integrated into Alipay's live Edge Recommendation system since January 2024, currently serving the app's recommendation traffic across \textbf{8.4 million daily active devices}.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes