CL AIMar 1, 2025

BERT-based model for Vietnamese Fact Verification Dataset

Bao Tran, T. N. Khanh, Khang Nguyen Tuong, Thien Dang, Quang Nguyen, Nguyen T. Thinh, Vo T. Hung

arXiv:2503.00356v12.7h-index: 1

Originality Synthesis-oriented

AI Analysis

This addresses the problem of misinformation in Vietnamese information systems, though it is an incremental application of existing methods to a new dataset.

The paper tackled fact verification for Vietnamese text by integrating sentence selection and classification modules into a unified network using pre-trained PhoBERT and XLM-RoBERTa, achieving a Strict Accuracy of 75.11% which is a 28.83% improvement over the baseline.

The rapid advancement of information and communication technology has facilitated easier access to information. However, this progress has also necessitated more stringent verification measures to ensure the accuracy of information, particularly within the context of Vietnam. This paper introduces an approach to address the challenges of Fact Verification using the Vietnamese dataset by integrating both sentence selection and classification modules into a unified network architecture. The proposed approach leverages the power of large language models by utilizing pre-trained PhoBERT and XLM-RoBERTa as the backbone of the network. The proposed model was trained on a Vietnamese dataset, named ISE-DSC01, and demonstrated superior performance compared to the baseline model across all three metrics. Notably, we achieved a Strict Accuracy level of 75.11\%, indicating a remarkable 28.83\% improvement over the baseline model.

View on arXiv PDF

Similar