TRAWL: Tensor Reduced and Approximated Weights for Large Language Models
This work addresses model efficiency and accuracy for LLM users, offering a novel approach that is incremental in advancing pruning methods.
The paper tackles the problem of improving efficiency and performance in large language models by introducing TRAWL, a tensor decomposition technique that reduces model weights and denoises LLMs, resulting in up to 16% performance improvement on benchmark datasets without extra data or training.
Recent research has shown that pruning large-scale language models for inference is an effective approach to improving model efficiency, significantly reducing model weights with minimal impact on performance. Interestingly, pruning can sometimes even enhance accuracy by removing noise that accumulates during training, particularly through matrix decompositions. However, recent work has primarily focused on single matrix decompositions or lower precision techniques, which may fail to fully capture structural patterns. To address these limitations, we introduce TRAWL (Tensor Reduced and Approximated Weights for Large Language Models), a technique that applies tensor decomposition across multiple weight matrices to effectively denoise LLMs by capturing global structural patterns. Our experiments show that TRAWL improves model performance by up to 16% over baseline models on benchmark datasets, without requiring additional data, training, or fine-tuning.