CLSep 14, 2022

UIT-ViCoV19QA: A Dataset for COVID-19 Community-based Question Answering on Vietnamese Language

arXiv:2209.06668v1265 citationsh-index: 10
Originality Synthesis-oriented
AI Analysis

This provides a resource for developing question answering systems to combat COVID-19 misinformation in the Vietnamese language, but it is incremental as it applies existing methods to a new dataset.

The authors tackled the problem of misinformation about COVID-19 in Vietnam by creating the first Vietnamese community-based question answering dataset, UIT-ViCoV19QA, comprising 4,500 question-answer pairs from trusted medical sources, and established baseline results using deep learning models with metrics like BLEU, METEOR, and ROUGE-L.

For the last two years, from 2020 to 2021, COVID-19 has broken disease prevention measures in many countries, including Vietnam, and negatively impacted various aspects of human life and the social community. Besides, the misleading information in the community and fake news about the pandemic are also serious situations. Therefore, we present the first Vietnamese community-based question answering dataset for developing question answering systems for COVID-19 called UIT-ViCoV19QA. The dataset comprises 4,500 question-answer pairs collected from trusted medical sources, with at least one answer and at most four unique paraphrased answers per question. Along with the dataset, we set up various deep learning models as baseline to assess the quality of our dataset and initiate the benchmark results for further research through commonly used metrics such as BLEU, METEOR, and ROUGE-L. We also illustrate the positive effects of having multiple paraphrased answers experimented on these models, especially on Transformer - a dominant architecture in the field of study.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes