AIMTRL-SCIIRDec 26, 2025

HalluMat: Detecting Hallucinations in LLM-Generated Materials Science Content Through Multi-Stage Verification

arXiv:2512.22396v11 citationsh-index: 16
Originality Incremental advance
AI Analysis

This addresses the critical issue of factual inaccuracies in AI-generated scientific content for materials science researchers, representing a domain-specific incremental improvement.

The paper tackled the problem of hallucination in LLM-generated materials science content by introducing HalluMatDetector, a multi-stage verification framework, which reduced hallucination rates by 30% compared to standard outputs.

Artificial Intelligence (AI), particularly Large Language Models (LLMs), is transforming scientific discovery, enabling rapid knowledge generation and hypothesis formulation. However, a critical challenge is hallucination, where LLMs generate factually incorrect or misleading information, compromising research integrity. To address this, we introduce HalluMatData, a benchmark dataset for evaluating hallucination detection methods, factual consistency, and response robustness in AI-generated materials science content. Alongside this, we propose HalluMatDetector, a multi-stage hallucination detection framework that integrates intrinsic verification, multi-source retrieval, contradiction graph analysis, and metric-based assessment to detect and mitigate LLM hallucinations. Our findings reveal that hallucination levels vary significantly across materials science subdomains, with high-entropy queries exhibiting greater factual inconsistencies. By utilizing HalluMatDetector verification pipeline, we reduce hallucination rates by 30% compared to standard LLM outputs. Furthermore, we introduce the Paraphrased Hallucination Consistency Score (PHCS) to quantify inconsistencies in LLM responses across semantically equivalent queries, offering deeper insights into model reliability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes