LG AIMay 10, 2023

Compressing Neural Networks Using Tensor Networks with Exponentially Fewer Variational Parameters

Yong Qing, Ke Li, Peng-Fei Zhou, Shi-Ju Ran

arXiv:2305.06058v312.313 citations

Originality Highly original

AI Analysis

This work addresses the problem of reducing neural network complexity for machine learning practitioners, offering a novel compression method that is not incremental but introduces a new approach using deep tensor networks.

The authors tackled the problem of compressing neural networks to reduce their massive variational parameters, which can cause overfitting and high hardware costs, by proposing a compression scheme using deep automatically differentiable tensor networks (ADTN) that contain exponentially fewer parameters. They demonstrated superior compression performance on several neural networks and datasets, such as compressing two linear layers in VGG-16 from about 10^7 parameters to 424 parameters while improving testing accuracy on CIFAR-10 from 90.17% to 91.74%.

Neural network (NN) designed for challenging machine learning tasks is in general a highly nonlinear mapping that contains massive variational parameters. High complexity of NN, if unbounded or unconstrained, might unpredictably cause severe issues including \R{overfitting}, loss of generalization power, and unbearable cost of hardware. In this work, we propose a general compression scheme that significantly reduces the variational parameters of NN's, despite of their specific types (linear, convolutional, \textit{etc}), by encoding them to deep \R{automatically differentiable} tensor network (ADTN) that contains exponentially-fewer free parameters. Superior compression performance of our scheme is demonstrated on several widely-recognized NN's (FC-2, LeNet-5, AlextNet, ZFNet and VGG-16) and datasets (MNIST, CIFAR-10 and CIFAR-100). For instance, we compress two linear layers in VGG-16 with approximately $10^{7}$ parameters to two ADTN's with just 424 parameters, improving the testing accuracy on CIFAR-10 from $90.17\%$ to $91.74\%$. We argue that the deep structure of ADTN is an essential reason for the remarkable compression performance of ADTN, compared to existing compression schemes that are mainly based on tensor decompositions/factorization and shallow tensor networks. Our work suggests deep TN as an exceptionally efficient mathematical structure for representing the variational parameters of NN's, which exhibits superior compressibility over the commonly-used matrices and multi-way arrays.

View on arXiv PDF

Similar