LG CV ITMar 18, 2020

MINT: Deep Network Compression via Mutual Information-based Neuron Trimming

Madan Ravi Ganesh, Jason J. Corso, Salimeh Yasaei Sekeh

arXiv:2003.08472v113.617 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of reducing computational costs and model size for deep learning practitioners, offering an incremental improvement over existing pruning methods by focusing on inter-layer relationships.

The paper tackles deep neural network compression via pruning by introducing MINT, a method that uses mutual information to trim neurons based on inter-layer dependencies, achieving state-of-the-art performance on benchmarks like MNIST, CIFAR-10, and ILSVRC2012 across various architectures.

Most approaches to deep neural network compression via pruning either evaluate a filter's importance using its weights or optimize an alternative objective function with sparsity constraints. While these methods offer a useful way to approximate contributions from similar filters, they often either ignore the dependency between layers or solve a more difficult optimization objective than standard cross-entropy. Our method, Mutual Information-based Neuron Trimming (MINT), approaches deep compression via pruning by enforcing sparsity based on the strength of the relationship between filters of adjacent layers, across every pair of layers. The relationship is calculated using conditional geometric mutual information which evaluates the amount of similar information exchanged between the filters using a graph-based criterion. When pruning a network, we ensure that retained filters contribute the majority of the information towards succeeding layers which ensures high performance. Our novel approach outperforms existing state-of-the-art compression-via-pruning methods on the standard benchmarks for this task: MNIST, CIFAR-10, and ILSVRC2012, across a variety of network architectures. In addition, we discuss our observations of a common denominator between our pruning methodology's response to adversarial attacks and calibration statistics when compared to the original network.

View on arXiv PDF

Similar