CL AI HCOct 21, 2024

Learning to Generate and Evaluate Fact-checking Explanations with Transformers

Darius Feher, Abdullah Khered, Hao Zhang, Riza Batista-Navarro, Viktor Schlegel

arXiv:2410.15669v12.74 citationsh-index: 23

Originality Incremental advance

AI Analysis

This work addresses the need for transparent and reliable AI-driven fact-checking systems to combat misinformation, though it appears incremental in building on existing transformer and XAI methods.

The researchers tackled the problem of misinformation by developing transformer-based models that generate and evaluate explanations for fact-checking decisions, achieving a ROUGE-1 score of 47.77 for explanation generation and a Matthews Correlation Coefficient of around 0.7 for evaluation metrics.

In an era increasingly dominated by digital platforms, the spread of misinformation poses a significant challenge, highlighting the need for solutions capable of assessing information veracity. Our research contributes to the field of Explainable Artificial Antelligence (XAI) by developing transformer-based fact-checking models that contextualise and justify their decisions by generating human-accessible explanations. Importantly, we also develop models for automatic evaluation of explanations for fact-checking verdicts across different dimensions such as \texttt{(self)-contradiction}, \texttt{hallucination}, \texttt{convincingness} and \texttt{overall quality}. By introducing human-centred evaluation methods and developing specialised datasets, we emphasise the need for aligning Artificial Intelligence (AI)-generated explanations with human judgements. This approach not only advances theoretical knowledge in XAI but also holds practical implications by enhancing the transparency, reliability and users' trust in AI-driven fact-checking systems. Furthermore, the development of our metric learning models is a first step towards potentially increasing efficiency and reducing reliance on extensive manual assessment. Based on experimental results, our best performing generative model \textsc{ROUGE-1} score of 47.77, demonstrating superior performance in generating fact-checking explanations, particularly when provided with high-quality evidence. Additionally, the best performing metric learning model showed a moderately strong correlation with human judgements on objective dimensions such as \texttt{(self)-contradiction and \texttt{hallucination}, achieving a Matthews Correlation Coefficient (MCC) of around 0.7.}

View on arXiv PDF

Similar