CL AIDec 17, 2024

LinguaLIFT: An Effective Two-stage Instruction Tuning Framework for Low-Resource Language Reasoning

Hongbin Zhang, Kehai Chen, Xuefeng Bai, Yang Xiang, Min Zhang

arXiv:2412.12499v23.44 citationsh-index: 10

Originality Highly original

AI Analysis

This addresses the problem of language imbalance in AI reasoning for low-resource language communities, representing an incremental improvement with a novel method for a known bottleneck.

The paper tackles the performance gap in reasoning tasks between high- and low-resource languages by proposing LinguaLIFT, a two-stage instruction tuning framework that transfers cross-lingual reasoning capabilities using English-only data, and it outperforms baselines across benchmarks including a new multilingual benchmark spanning 48 languages.

Large language models (LLMs) have exhibited impressive multilingual reasoning capabilities, driven by extensive multilingual pre-training corpora and instruction fine-tuning data. However, a performance gap exists between high- and low-resource language reasoning tasks due to the language imbalance in the pre-training corpus, which is exacerbated by evaluation bias in existing reasoning benchmarks lacking low-resource language coverage. To alleviate this issue, we propose LinguaLIFT, a two-stage instruction tuning framework for advancing low-resource language reasoning. LinguaLIFT employs a language alignment layer to capture multilingual alignment in a code-switched tuning way without requiring multilingual instruction or parallel data, thereby transferring the cross-lingual reasoning capabilities to low-resource languages through English-only instruction tuning data. To comprehensively evaluate the multilingual reasoning capabilities, we introduce the Multilingual Math World Problem (MMWP) benchmark, which spans 21 low-resource, 17 medium-resource, and 10 high-resource languages. Experimental results show that LinguaLIFT outperforms several competitive baselines across MMWP and four widely used benchmarks.

View on arXiv PDF

Similar