LGDec 16, 2024

FinLoRA: Finetuning Quantized Financial Large Language Models Using Low-Rank Adaptation

Dannong Wang, Daniel Kim, Bo Jin, Xingjian Zhao, Tianfan Fu, Steve Yang, Xiao-Yang Liu

arXiv:2412.11378v26.45 citationsh-index: 3

Originality Synthesis-oriented

AI Analysis

This work addresses resource-efficient local finetuning for financial institutions, but it is incremental as it applies known QLoRA techniques to a specific domain.

The paper tackles the problem of finetuning financial large language models (FinLLMs) under GPU memory constraints and long input sequences by employing quantized low-rank adaptation (QLoRA) with data and pipeline parallelism, achieving substantial improvements in accuracy, GPU memory usage, and time efficiency on financial datasets.

Finetuned large language models (LLMs) have shown remarkable performance in financial tasks, such as sentiment analysis and information retrieval. Due to privacy concerns, finetuning and deploying Financial LLMs (FinLLMs) locally are crucial for institutions. However, finetuning FinLLMs poses challenges including GPU memory constraints and long input sequences. In this paper, we employ quantized low-rank adaptation (QLoRA) to finetune FinLLMs, which leverage low-rank matrix decomposition and quantization techniques to significantly reduce computational requirements while maintaining high model performance. We also employ data and pipeline parallelism to enable local finetuning using cost-effective, widely accessible GPUs. Experiments on financial datasets demonstrate that our method achieves substantial improvements in accuracy, GPU memory usage, and time efficiency, underscoring the potential of lowrank methods for scalable and resource-efficient LLM finetuning.

View on arXiv PDF

Similar