"As Eastern Powers, I will veto." : An Investigation of Nation-level Bias of Large Language Models in International Relations
This addresses bias issues for users applying LLMs in International Relations, though it is incremental as it builds on existing debiasing methods.
The paper systematically examines nation-level biases in Large Language Models (LLMs) within International Relations, revealing biases such as favoritism toward Western nations and unfavorability toward Russia, with variations across models and contexts, and introduces a debiasing framework that reduces bias and improves performance in models like GPT-4o-mini and LLama-3.3-70B.
This paper systematically examines nation-level biases exhibited by Large Language Models (LLMs) within the domain of International Relations (IR). Leveraging historical records from the United Nations Security Council (UNSC), we developed a bias evaluation framework comprising three distinct tests to explore nation-level bias in various LLMs, with a particular focus on the five permanent members of the UNSC. Experimental results show that, even with the general bias patterns across models (e.g., favorable biases toward the western nations, and unfavorable biases toward Russia), these still vary based on the LLM. Notably, even within the same LLM, the direction and magnitude of bias for a nation change depending on the evaluation context. This observation suggests that LLM biases are fundamentally multidimensional, varying across models and tasks. We also observe that models with stronger reasoning abilities show reduced bias and better performance. Building on this finding, we introduce a debiasing framework that improves LLMs' factual reasoning combining Retrieval-Augmented Generation with Reflexion-based self-reflection techniques. Experiments show it effectively reduces nation-level bias, and improves performance, particularly in GPT-4o-mini and LLama-3.3-70B. Our findings emphasize the need to assess nation-level bias alongside performance when applying LLMs in the IR domain.