Learning Parameter Sharing with Tensor Decompositions and Sparsity
This addresses the deployment challenge for large models like Vision Transformers and Large Language Models on resource-constrained systems, representing an incremental improvement in model compression techniques.
The paper tackles the problem of compressing large neural networks for deployment on resource-constrained systems by introducing Fine-grained Parameter Sharing (FiPS), which reduces parameter budgets by 40-75% for models like DeiT-B, Swin-L, Gemma-2, and Llama-3 while maintaining accuracy within 1% point and perplexity with negligible degradation.
Large neural networks exhibit exceptional performance across numerous tasks, yet their considerable size often hinders deployment on resource-constrained systems. While various model compression strategies have been well studied, parameter sharing remains underexplored. In this paper, we introduce Fine-grained Parameter Sharing (FiPS), a novel algorithm that leverages parameter sharing, tensor decomposition, and sparsity to effectively compress large-scale Vision Transformers (ViTs) and Large Language Models (LLMs). FiPS employs a shared base and sparse factors to represent neurons across multi-layer perceptron (MLP) modules, where initialization is guided by Singular Value Decomposition (SVD) and subsequent optimization is conducted through block-wise reconstruction error minimization. Experimental results show that FiPS reduces the parameter budget of MLP modules by 50-75% for DeiT-B and Swin-L and by 40-50% for various Gemma-2 and Llama-3 models while maintaining ViT model accuracy within 1% pt. of the original and LLM perplexity with negligible degradation.