CLNov 27, 2023

Overview of the VLSP 2022 -- Abmusu Shared Task: A Data Challenge for Vietnamese Abstractive Multi-document Summarization

Mai-Vu Tran, Hoang-Quynh Le, Duy-Cat Can, Quoc-An Nguyen

arXiv:2311.15525v10.51 citationsh-index: 6

Originality Synthesis-oriented

AI Analysis

This addresses the need for automated summarization systems for Vietnamese news, but it is incremental as it focuses on a specific language and dataset.

The paper tackles the problem of Vietnamese abstractive multi-document summarization by introducing a shared task and dataset, resulting in models evaluated using ROUGE2-F1 scores on a dataset of 1,839 documents in 600 clusters.

This paper reports the overview of the VLSP 2022 - Vietnamese abstractive multi-document summarization (Abmusu) shared task for Vietnamese News. This task is hosted at the 9$^{th}$ annual workshop on Vietnamese Language and Speech Processing (VLSP 2022). The goal of Abmusu shared task is to develop summarization systems that could create abstractive summaries automatically for a set of documents on a topic. The model input is multiple news documents on the same topic, and the corresponding output is a related abstractive summary. In the scope of Abmusu shared task, we only focus on Vietnamese news summarization and build a human-annotated dataset of 1,839 documents in 600 clusters, collected from Vietnamese news in 8 categories. Participated models are evaluated and ranked in terms of \texttt{ROUGE2-F1} score, the most typical evaluation metric for document summarization problem.

View on arXiv PDF

Similar