CLJun 18, 2021

Challenges and Limitations with the Metrics Measuring the Complexity of Code-Mixed Text

arXiv:2106.10123v131.8732 citations

Originality Synthesis-oriented

AI Analysis

This work addresses challenges in natural language processing for multilingual speakers by critiquing current evaluation methods, but it is incremental as it focuses on limitations rather than proposing new solutions.

The paper identifies and demonstrates inherent limitations in existing metrics used to measure the complexity of code-mixed text, using examples from popular datasets to highlight these issues.

Code-mixing is a frequent communication style among multilingual speakers where they mix words and phrases from two different languages in the same utterance of text or speech. Identifying and filtering code-mixed text is a challenging task due to its co-existence with monolingual and noisy text. Over the years, several code-mixing metrics have been extensively used to identify and validate code-mixed text quality. This paper demonstrates several inherent limitations of code-mixing metrics with examples from the already existing datasets that are popularly used across various experiments.

View on arXiv PDF

Similar