The Multilingual Divide and Its Impact on Global AI Safety
This work addresses safety risks for global users of AI in non-dominant languages, but it is incremental as it focuses on overview and recommendations rather than new solutions.
The paper tackles the problem of the language gap in AI, where large language models have significantly lower capabilities and safety performance for non-dominant languages, and provides an analysis and recommendations to address these disparities.
Despite advances in large language model capabilities in recent years, a large gap remains in their capabilities and safety performance for many languages beyond a relatively small handful of globally dominant languages. This paper provides researchers, policymakers and governance experts with an overview of key challenges to bridging the "language gap" in AI and minimizing safety risks across languages. We provide an analysis of why the language gap in AI exists and grows, and how it creates disparities in global AI safety. We identify barriers to address these challenges, and recommend how those working in policy and governance can help address safety concerns associated with the language gap by supporting multilingual dataset creation, transparency, and research.