CLMar 26

TAPO: Translation Augmented Policy Optimization for Multilingual Mathematical Reasoning

arXiv:2603.2541987.91 citationsh-index: 9
AI Analysis

This addresses the problem of multilingual mathematical reasoning for AI applications, representing an incremental improvement with a novel method for a known bottleneck.

The paper tackles the performance disparity of large language models in multilingual mathematical reasoning by introducing Translation-Augmented Policy Optimization (TAPO), a reinforcement learning framework that uses English as a pivot to align language understanding with reasoning, resulting in outperforming baseline methods in multilingual tasks.

Large Language Models (LLMs) have demonstrated remarkable proficiency in English mathematical reasoning, yet a significant performance disparity persists in multilingual contexts, largely attributed to deficiencies in language understanding. To bridge this gap, we introduce Translation-Augmented Policy Optimization (TAPO), a novel reinforcement learning framework built upon GRPO. TAPO enforces an explicit alignment strategy where the model leverages English as a pivot and follows an understand-then-reason paradigm. Crucially, we employ a step-level relative advantage mechanism that decouples understanding from reasoning, allowing the integration of translation quality rewards without introducing optimization conflicts. Extensive experiments reveal that TAPO effectively synergizes language understanding with reasoning capabilities and is compatible with various models. It outperforms baseline methods in both multilingual mathematical reasoning and translation tasks, while generalizing well to unseen languages and out-of-domain tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes