CLFeb 20, 2025

Can LLMs Simulate L2-English Dialogue? An Information-Theoretic Analysis of L1-Dependent Biases

arXiv:2502.14507v12 citationsh-index: 36Has CodeACL
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of simulating realistic L2 English dialogue for educational applications, though it is incremental in applying existing LLMs to a new linguistic analysis task.

This study evaluated whether large language models (LLMs) can simulate non-native English use by L2 learners influenced by their native languages, finding that modern LLMs replicate L1-dependent linguistic patterns observed in human data, such as tense agreement and collocation biases.

This study evaluates Large Language Models' (LLMs) ability to simulate non-native-like English use observed in human second language (L2) learners interfered with by their native first language (L1). In dialogue-based interviews, we prompt LLMs to mimic L2 English learners with specific L1s (e.g., Japanese, Thai, Urdu) across seven languages, comparing their outputs to real L2 learner data. Our analysis examines L1-driven linguistic biases, such as reference word usage and avoidance behaviors, using information-theoretic and distributional density measures. Results show that modern LLMs (e.g., Qwen2.5, LLAMA3.3, DeepseekV3, GPT-4o) replicate L1-dependent patterns observed in human L2 data, with distinct influences from various languages (e.g., Japanese, Korean, and Mandarin significantly affect tense agreement, and Urdu influences noun-verb collocations). Our results reveal the potential of LLMs for L2 dialogue generation and evaluation for future educational applications.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes