AIJun 16, 2022

PreCogIIITH at HinglishEval : Leveraging Code-Mixing Metrics & Language Model Embeddings To Estimate Code-Mix Quality

arXiv:2206.07988v1290 citationsh-index: 46
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of assessing synthetic code-mixed text quality for multilingual communities, but it is incremental as it builds on existing shared-task frameworks.

The paper tackled the problem of evaluating the quality of machine-generated code-mixed text, a low-resource task, by building models to predict ratings for code-mix quality as part of the HinglishEval shared-task, achieving results in a competitive benchmark setting.

Code-Mixing is a phenomenon of mixing two or more languages in a speech event and is prevalent in multilingual societies. Given the low-resource nature of Code-Mixing, machine generation of code-mixed text is a prevalent approach for data augmentation. However, evaluating the quality of such machine generated code-mixed text is an open problem. In our submission to HinglishEval, a shared-task collocated with INLG2022, we attempt to build models factors that impact the quality of synthetically generated code-mix text by predicting ratings for code-mix quality.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes