Multilingual Prompting for Improving LLM Generation Diversity
This addresses the issue of cultural bias and limited diversity in LLM outputs for users relying on these models for culturally-sensitive applications, though it is an incremental improvement over existing prompting methods.
The authors tackled the problem of LLMs lacking cultural representation and diversity in generations by proposing multilingual prompting, which adds cultural and linguistic cues from multiple cultures to prompts, and they found it consistently outperforms existing diversity-enhancing techniques across various models.
Large Language Models (LLMs) are known to lack cultural representation and overall diversity in their generations, from expressing opinions to answering factual questions. To mitigate this problem, we propose multilingual prompting: a prompting method which generates several variations of a base prompt with added cultural and linguistic cues from several cultures, generates responses, and then combines the results. Building on evidence that LLMs have language-specific knowledge, multilingual prompting seeks to increase diversity by activating a broader range of cultural knowledge embedded in model training data. Through experiments across multiple models (GPT-4o, GPT-4o-mini, LLaMA 70B, and LLaMA 8B), we show that multilingual prompting consistently outperforms existing diversity-enhancing techniques such as high-temperature sampling, step-by-step recall, and persona prompting. Further analyses show that the benefits of multilingual prompting vary between high and low resource languages and across model sizes, and that aligning the prompting language with cultural cues reduces hallucination about culturally-specific information.