LGMar 13, 2021

Robust Model Compression Using Deep Hypotheses

arXiv:2103.07668v11.6Has Code

Originality Incremental advance

AI Analysis

This addresses the need for efficient and interpretable AI systems by providing a general compression method, though it appears incremental as it builds on existing depth concepts.

The paper tackles the problem of creating compact and robust machine learning models by developing a model compression scheme that works across different model types, resulting in compressed models that are more accurate and robust than comparable methods.

Machine Learning models should ideally be compact and robust. Compactness provides efficiency and comprehensibility whereas robustness provides resilience. Both topics have been studied in recent years but in isolation. Here we present a robust model compression scheme which is independent of model types: it can compress ensembles, neural networks and other types of models into diverse types of small models. The main building block is the notion of depth derived from robust statistics. Originally, depth was introduced as a measure of the centrality of a point in a sample such that the median is the deepest point. This concept was extended to classification functions which makes it possible to define the depth of a hypothesis and the median hypothesis. Algorithms have been suggested to approximate the median but they have been limited to binary classification. In this study, we present a new algorithm, the Multiclass Empirical Median Optimization (MEMO) algorithm that finds a deep hypothesis in multi-class tasks, and prove its correctness. This leads to our Compact Robust Estimated Median Belief Optimization (CREMBO) algorithm for robust model compression. We demonstrate the success of this algorithm empirically by compressing neural networks and random forests into small decision trees, which are interpretable models, and show that they are more accurate and robust than other comparable methods. In addition, our empirical study shows that our method outperforms Knowledge Distillation on DNN to DNN compression.

View on arXiv PDF Code

Similar