CL SIJun 16, 2022

JU_NLP at HinglishEval: Quality Evaluation of the Low-Resource Code-Mixed Hinglish Text

arXiv:2206.08053v123.9290 citationsh-index: 32

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of evaluating text quality in a low-resource, code-mixed language setting, but it is incremental as it applies an existing method to a new dataset.

The paper tackled the problem of quality evaluation for low-resource synthetically generated code-mixed Hinglish text by implementing a Bi-LSTM-based neural network model to predict average rating and disagreement scores, achieving an F1 score of 0.11 and mean squared error of 6.0 for average rating, and an F1 score of 0.18 and mean squared error of 5.0 for disagreement score.

In this paper we describe a system submitted to the INLG 2022 Generation Challenge (GenChal) on Quality Evaluation of the Low-Resource Synthetically Generated Code-Mixed Hinglish Text. We implement a Bi-LSTM-based neural network model to predict the Average rating score and Disagreement score of the synthetic Hinglish dataset. In our models, we used word embeddings for English and Hindi data, and one hot encodings for Hinglish data. We achieved a F1 score of 0.11, and mean squared error of 6.0 in the average rating score prediction task. In the task of Disagreement score prediction, we achieve a F1 score of 0.18, and mean squared error of 5.0.

View on arXiv PDF

Similar