LGSep 19, 2025

RMT-KD: Random Matrix Theoretic Causal Knowledge Distillation

arXiv:2509.15724v32 citationsh-index: 6
Originality Highly original
AI Analysis

This addresses the cost and efficiency issues for deploying models like BERT and ResNet at the edge, representing a novel method for a known bottleneck.

The paper tackles the problem of compressing large deep learning models for edge deployment by introducing RMT-KD, a method using Random Matrix Theory for knowledge distillation, achieving up to 80% parameter reduction with only 2% accuracy loss and 2.8x faster inference.

Large deep learning models such as BERT and ResNet achieve state-of-the-art performance but are costly to deploy at the edge due to their size and compute demands. We present RMT-KD, a compression method that leverages Random Matrix Theory (RMT) for knowledge distillation to iteratively reduce network size. Instead of pruning or heuristic rank selection, RMT-KD preserves only informative directions identified via the spectral properties of hidden representations. RMT-based causal reduction is applied layer by layer with self-distillation to maintain stability and accuracy. On GLUE, AG News, and CIFAR-10, RMT-KD achieves up to 80% parameter reduction with only 2% accuracy loss, delivering 2.8x faster inference and nearly halved power consumption. These results establish RMT-KD as a mathematically grounded approach to network distillation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes