HCMay 8

From Standard English to Singlish: A Retrieval-Augmented Approach for Code-Switched Creole Generation in Large Language Models

arXiv:2605.0713231.1
Predicted impact top 60% in HC · last 90 daysOriginality Synthesis-oriented
AI Analysis

For NLP practitioners working on low-resource code-switched varieties, this work offers a practical, auditable alternative to fine-tuning, though it is incremental as it applies existing RAG techniques to a new domain.

The paper tackles code-switched generation for Singaporean English (Singlish) using a retrieval-augmented generation (RAG) framework that externalizes dialectal knowledge into a curated lexicon. Human evaluation with 164 participants found RAG and zero-shot prompting equally natural and appropriate, with RAG performing minimal substitutions (median 1 edit) and higher semantic preservation (cosine similarity 0.978 vs. 0.926).

Code-switching in contact varieties like Singaporean English (Singlish) challenges natural language generation due to limited parallel data and rapid lexical evolution. We propose a retrieval-augmented generation (RAG) framework that externalizes dialectal knowledge into a curated lexicon, enabling controlled lexical code-switching without fine-tuning. Our approach retrieves candidate Singlish expressions and guides generation through sparse lexical substitution. Human evaluation with 164 Singaporean participants found RAG and zero-shot prompting equally natural and appropriate. Automatic analyses reveal different transformation regimes: zero-shot prompting induces extensive paraphrasing (median 23 token edits), whereas RAG performs minimal substitutions (median 1 edit) with higher semantic preservation (mean cosine similarity 0.978 vs. 0.926). Our results demonstrate that externalizing code-switching into lexical resources enables control and auditability without sacrificing perceived quality, offering practical advantages for rapidly evolving contact varieties.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes