CLAILGAug 7, 2024

SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature

arXiv:2408.03936v17 citationsh-index: 3
Originality Synthesis-oriented
AI Analysis

It addresses a domain-specific problem for users of Mercosur Common Nomenclature, with incremental improvements in fine-tuning efficiency.

This study tackled the problem of poor cross-linguistic performance in NLP for Mercosur Common Nomenclature (NCM) applications by proposing SLIM-RAFT, a simplified fine-tuning approach, which significantly outperformed TeenyTineLLaMA and ChatGPT-4 in the same task.

Natural language processing (NLP) has seen significant advancements with the advent of large language models (LLMs). However, substantial improvements are still needed for languages other than English, especially for specific domains like the applications of Mercosur Common Nomenclature (NCM), a Brazilian Harmonized System (HS). To address this gap, this study uses TeenyTineLLaMA, a foundational Portuguese LLM, as an LLM source to implement the NCM application processing. Additionally, a simplified Retrieval-Augmented Fine-Tuning (RAFT) technique, termed SLIM-RAFT, is proposed for task-specific fine-tuning of LLMs. This approach retains the chain-of-thought (CoT) methodology for prompt development in a more concise and streamlined manner, utilizing brief and focused documents for training. The proposed model demonstrates an efficient and cost-effective alternative for fine-tuning smaller LLMs, significantly outperforming TeenyTineLLaMA and ChatGPT-4 in the same task. Although the research focuses on NCM applications, the methodology can be easily adapted for HS applications worldwide.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes