CLOct 22, 2025

The Art of Asking: Multilingual Prompt Optimization for Synthetic Data

arXiv:2510.19806v13 citationsh-index: 22
Originality Highly original
AI Analysis

This addresses the challenge of building more robust and culturally grounded multilingual LLMs, representing an incremental improvement over existing translation-based methods.

The paper tackled the problem of multilingual synthetic data generation being bottlenecked by translation-based prompts, which limit model generalization, by introducing a lightweight prompt-space optimization framework that improved downstream performance across 12 languages, achieving gains such as +4.7% on Global-MMLU accuracy and +35.3% wins in preferences on mArenaHard.

Synthetic data has become a cornerstone for scaling large language models, yet its multilingual use remains bottlenecked by translation-based prompts. This strategy inherits English-centric framing and style and neglects cultural dimensions, ultimately constraining model generalization. We argue that the overlooked prompt space-the very inputs that define training distributions-offers a more powerful lever for improving multilingual performance. We introduce a lightweight framework for prompt-space optimization, where translated prompts are systematically transformed for Naturalness, Cultural Adaptation, and Difficulty Enhancement. Using an off-the-shelf multilingual LLM, we apply these transformations to prompts for 12 languages spanning 7 families. Under identical data conditions, our approaches achieve substantial and consistent downstream improvements over the translation-only baseline: +4.7% on Global-MMLU accuracy, +2.4% on Flores XCometXL and +35.3% wins in preferences on mArenaHard. We establish prompt-space optimization as a simple yet powerful paradigm for building multilingual LLMs that are more robust, culturally grounded, and globally capable.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes