LGMEOct 22, 2025

Knowledge Distillation of Uncertainty using Deep Latent Factor Model

arXiv:2510.19290v21 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses the computational and memory limitations of deep ensembles for on-device AI applications, offering a method to preserve uncertainty in compressed models, though it is incremental as it builds on existing knowledge distillation techniques.

The paper tackles the problem of compressing deep ensembles for uncertainty quantification into smaller models by introducing Gaussian distillation, which uses a deep latent factor model to estimate the teacher ensemble's distribution, and demonstrates improved performance over baselines on benchmark datasets.

Deep ensembles deliver state-of-the-art, reliable uncertainty quantification, but their heavy computational and memory requirements hinder their practical deployments to real applications such as on-device AI. Knowledge distillation compresses an ensemble into small student models, but existing techniques struggle to preserve uncertainty partly because reducing the size of DNNs typically results in variation reduction. To resolve this limitation, we introduce a new method of distribution distillation (i.e. compressing a teacher ensemble into a student distribution instead of a student ensemble) called Gaussian distillation, which estimates the distribution of a teacher ensemble through a special Gaussian process called the deep latent factor model (DLF) by treating each member of the teacher ensemble as a realization of a certain stochastic process. The mean and covariance functions in the DLF model are estimated stably by using the expectation-maximization (EM) algorithm. By using multiple benchmark datasets, we demonstrate that the proposed Gaussian distillation outperforms existing baselines. In addition, we illustrate that Gaussian distillation works well for fine-tuning of language models and distribution shift problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes