Friend or Foe? Language as an ideological switch in open-weight LLMs under Russian disinformation stress
For policymakers and developers deploying LLMs in contested information environments, this paper disconfirms the assumption that cultural alignment guarantees resilience, highlighting a critical vulnerability in digital sovereignty.
This paper tests whether culturally aligned fine-tuning of LLMs for post-Soviet languages confers resistance to Russian disinformation. Contrary to expectations, the Ukrainian-oriented model showed the weakest resistance in Russian, while the Russian-oriented model showed the strongest rejection, revealing a 'Fine-Tuning Paradox' where corpus composition and language coverage matter more than cultural provenance.
As Russia's war against Ukraine extends into generative AI, large language models (LLMs) adapted for local post-Soviet languages are deployed in contested information environments. Policy and industry discourse assumes that culturally aligned adaptation encodes the political orientation of the target community: a Ukrainian-oriented model will resist Russian narratives, a Russian-oriented one will reinforce them. Does it? This article systematically disconfirms that assumption. We run a controlled audit of four openly available LLMs sharing a common base model but fine-tuned for different linguistic communities, querying them in Ukrainian, Russian and English across ten contested wartime narratives: Crimea, "denazification", the "one people" thesis, and atrocity denial at Bucha and Mariupol. The result is a Fine-Tuning Paradox: the Ukrainian-oriented model shows the weakest resistance to Russian disinformation in Russian, while the Russian-oriented one exhibits the strongest rejection. Corpus composition, language coverage and prompt format prove more decisive than nominal cultural provenance. We situate these findings within debates on hybrid warfare, digital sovereignty and post-imperial information orders, arguing that the principal threat to regional information sovereignty is not adversarial fine-tuning but the untested assumption that cultural alignment guarantees resilience.