Cross-lingual Self-Consistency for Multilingual Reasoning with Language Models
This work addresses the limitation of LLMs' reasoning capabilities being confined to high-resource languages, offering a data-efficient method to improve multilingual reasoning without requiring gold answers or parallel data.
The paper proposes an unsupervised Reinforcement Learning approach to enhance multilingual reasoning in LLMs by enforcing cross-lingual self-consistency, achieving up to 21.7% average gains on MGSM across 10 languages and strong generalization to unseen languages.
Despite expanding their multilingual coverage, the advanced reasoning capabilities of LLMs remain largely confined to a few high-resource languages like English. To address this, we propose an unsupervised Reinforcement Learning (RL) approach to enhance multilingual reasoning by enforcing cross-lingual self-consistency: the principle that a model should produce the same final answer for equivalent problems in different languages. Existing methods are limited by the scarcity of multilingual reasoning data and show weak generalization to unseen languages. Our approach requires neither gold answers nor parallel data, and it achieves average gains of up to 21.7% on MGSM across 10 languages. In addition, our method demonstrates strong generalization, with an 18.2% mean improvement on MGSM languages unseen during training, and up to 6.2% gain on 3 out-of-distribution benchmarks. These results show the potential of consistency-based methods to improve the multilingual capabilities of LLMs without requiring supervised data.