CLIROct 21, 2024

Limpeh ga li gong: Challenges in Singlish Annotations

arXiv:2410.16156v21 citationsh-index: 3
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of processing colloquial languages like Singlish for NLP applications, though it is incremental as it focuses on a specific task and dataset.

The paper tackles the problem of Parts-Of-Speech (POS) tagging for Singlish (Colloquial Singapore English), building a parallel dataset with translations and POS tags annotated by native speakers. Experiments show that automatic taggers achieve only about 80% accuracy against human annotations, indicating significant room for improvement in computational analysis of the language.

Singlish, or Colloquial Singapore English, is a language formed from oral and social communication within multicultural Singapore. In this work, we work on a fundamental Natural Language Processing (NLP) task: Parts-Of-Speech (POS) tagging of Singlish sentences. For our analysis, we build a parallel Singlish dataset containing direct English translations and POS tags, with translation and POS annotation done by native Singlish speakers. Our experiments show that automatic transition- and transformer- based taggers perform with only $\sim 80\%$ accuracy when evaluated against human-annotated POS labels, suggesting that there is indeed room for improvement on computation analysis of the language. We provide an exposition of challenges in Singlish annotation: its inconsistencies in form and semantics, the highly context-dependent particles of the language, its structural unique expressions, and the variation of the language on different mediums. Our task definition, resultant labels and results reflects the challenges in analysing colloquial languages formulated from a variety of dialects, and paves the way for future studies beyond POS tagging.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes