CL AIMar 2, 2025

SemViQA: A Semantic Question Answering System for Vietnamese Information Fact-Checking

Dien X. Tran, Nam V. Nguyen, Thanh T. Tran, Anh T. Hoang, Tai V. Duong, Di T. Le, Phuc-Lu Le

arXiv:2503.00955v32.72 citationsh-index: 3Has Code

Originality Incremental advance

AI Analysis

This addresses misinformation in low-resource languages like Vietnamese, though it appears incremental as it builds on existing fact-checking methods.

The paper tackles Vietnamese fact-checking by introducing SemViQA, a framework that integrates semantic evidence retrieval and two-step verdict classification, achieving state-of-the-art results with 78.97% accuracy on ISE-DSC01 and 80.82% on ViWikiFC, and a faster version improves inference speed 7x.

The rise of misinformation, exacerbated by Large Language Models (LLMs) like GPT and Gemini, demands robust fact-checking solutions, especially for low-resource languages like Vietnamese. Existing methods struggle with semantic ambiguity, homonyms, and complex linguistic structures, often trading accuracy for efficiency. We introduce SemViQA, a novel Vietnamese fact-checking framework integrating Semantic-based Evidence Retrieval (SER) and Two-step Verdict Classification (TVC). Our approach balances precision and speed, achieving state-of-the-art results with 78.97\% strict accuracy on ISE-DSC01 and 80.82\% on ViWikiFC, securing 1st place in the UIT Data Science Challenge. Additionally, SemViQA Faster improves inference speed 7x while maintaining competitive accuracy. SemViQA sets a new benchmark for Vietnamese fact verification, advancing the fight against misinformation. The source code is available at: https://github.com/DAVID-NGUYEN-S16/SemViQA.

View on arXiv PDF Code

Similar