CY CLFeb 14, 2025

Beyond English: Unveiling Multilingual Bias in LLM Copyright Compliance

Yupeng Chen, Xiaoyu Zhang, Yixian Huang, Qian Xie

arXiv:2503.05713v12 citationsh-index: 1

Originality Incremental advance

AI Analysis

This work addresses the problem of inconsistent copyright protection across languages for users and developers of LLMs, though it is incremental as it extends prior English-focused studies to multilingual contexts.

The study investigated multilingual biases in LLM copyright compliance by probing seven LLMs with prompts in English, French, Chinese, and Korean using a dataset of song lyrics, revealing significant imbalances in how copyrighted content is handled across languages.

Large Language Models (LLMs) have raised significant concerns regarding the fair use of copyright-protected content. While prior studies have examined the extent to which LLMs reproduce copyrighted materials, they have predominantly focused on English, neglecting multilingual dimensions of copyright protection. In this work, we investigate multilingual biases in LLM copyright protection by addressing two key questions: (1) Do LLMs exhibit bias in protecting copyrighted works across languages? (2) Is it easier to elicit copyrighted content using prompts in specific languages? To explore these questions, we construct a dataset of popular song lyrics in English, French, Chinese, and Korean and systematically probe seven LLMs using prompts in these languages. Our findings reveal significant imbalances in LLMs' handling of copyrighted content, both in terms of the language of the copyrighted material and the language of the prompt. These results highlight the need for further research and development of more robust, language-agnostic copyright protection mechanisms to ensure fair and consistent protection across languages.

View on arXiv PDF

Similar