Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting
This work addresses the issue of performance variability across languages in LLMs, which is crucial for improving fairness and utility in multilingual AI applications, though it is incremental as it builds on existing prompting methods.
The paper tackles the problem of uneven multilingual performance in large language models by introducing cross-lingual-thought prompting (XLT), a generic template prompt that enhances task performance across languages, resulting in over 10 points of average improvement in arithmetic reasoning and open-domain question-answering tasks.
Large language models (LLMs) demonstrate impressive multilingual capability, but their performance varies substantially across different languages. In this work, we introduce a simple yet effective method, called cross-lingual-thought prompting (XLT), to systematically improve the multilingual capability of LLMs. Specifically, XLT is a generic template prompt that stimulates cross-lingual and logical reasoning skills to enhance task performance across languages. We conduct comprehensive evaluations on 7 typical benchmarks related to reasoning, understanding, and generation tasks, covering both high-resource and low-resource languages. Experimental results show that XLT not only remarkably enhances the performance of various multilingual tasks but also significantly reduces the gap between the average performance and the best performance of each task in different languages. Notably, XLT brings over 10 points of average improvement in arithmetic reasoning and open-domain question-answering tasks.