CLLGDec 30, 2024

DoTA: Weight-Decomposed Tensor Adaptation for Large Language Models

arXiv:2412.20891v13 citationsh-index: 10Has CodePAKDD
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in efficient fine-tuning of LLMs for researchers and practitioners, offering an incremental improvement over existing tensor adaptation methods.

The paper tackles the problem of random initialization in tensor decomposition methods for fine-tuning large language models, which leads to performance divergence from full fine-tuning, and proposes Weight-Decomposed Tensor Adaptation (DoTA) using MPO decomposition for better initialization, achieving improved performance with fewer parameters on commonsense and arithmetic reasoning tasks.

Low-rank adaptation (LoRA) reduces the computational and memory demands of fine-tuning large language models (LLMs) by approximating updates with low-rank matrices. However, low-rank approximation in two-dimensional space fails to capture high-dimensional structures within the target matrix. Recently, tensor decomposition methods have been explored for fine-tuning LLMs, leveraging their ability to extract structured information. Yet, these approaches primarily rely on random initialization, and the impact of initialization on tensor adaptation remains underexplored. In this paper, we reveal that random initialization significantly diverges from the validation loss achieved by full fine-tuning. To address this, we propose Weight-Decomposed Tensor Adaptation (DoTA), which leverages the Matrix Product Operator (MPO) decomposition of pre-trained weights for effective initialization in fine-tuning LLMs. Additionally, we introduce QDoTA, a quantized version of DoTA designed for 4-bit quantization. Experiments on commonsense and arithmetic reasoning tasks show that DoTA outperforms random initialization methods with fewer parameters. QDoTA further reduces memory consumption and achieves comparable performance to DoTA on commonsense reasoning tasks. We will release our code to support future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes