CLFeb 18, 2025

Culturally-Nuanced Story Generation for Reasoning in Low-Resource Languages: The Case of Javanese and Sundanese

arXiv:2502.12932v22 citationsh-index: 5Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025)
AI Analysis

This addresses the challenge of data scarcity for culturally nuanced AI in low-resource languages, though it is incremental as it builds on existing LLM methods.

The paper tackled the problem of culturally grounded commonsense reasoning in low-resource languages like Javanese and Sundanese by testing if LLMs can generate culturally nuanced narratives, finding that LLM-generated data outperformed machine-translated and Indonesian human-authored data in downstream tasks.

Culturally grounded commonsense reasoning is underexplored in low-resource languages due to scarce data and costly native annotation. We test whether large language models (LLMs) can generate culturally nuanced narratives for such settings. Focusing on Javanese and Sundanese, we compare three data creation strategies: (1) LLM-assisted stories prompted with cultural cues, (2) machine translation from Indonesian benchmarks, and (3) native-written stories. Human evaluation finds LLM stories match natives on cultural fidelity but lag in coherence and correctness. We fine-tune models on each dataset and evaluate on a human-authored test set for classification and generation. LLM-generated data yields higher downstream performance than machine-translated and Indonesian human-authored training data. We release a high-quality benchmark of culturally grounded commonsense stories in Javanese and Sundanese to support future work.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes