CL AIApr 4, 2024

SHROOM-INDElab at SemEval-2024 Task 6: Zero- and Few-Shot LLM-Based Classification for Hallucination Detection

Bradley P. Allen, Fina Polat, Paul Groth

arXiv:2404.03732v114.126 citationsh-index: 7Has CodeSemEval

Originality Synthesis-oriented

AI Analysis

This work addresses hallucination detection for NLP researchers and practitioners, but is incremental as it builds on existing prompt-based methods.

The authors tackled hallucination detection in language models by developing a system using prompt programming and in-context learning with LLMs, achieving fourth-best and sixth-best performance in model-agnostic and model-aware tracks respectively, and found that zero-shot approaches outperformed few-shot ones.

We describe the University of Amsterdam Intelligent Data Engineering Lab team's entry for the SemEval-2024 Task 6 competition. The SHROOM-INDElab system builds on previous work on using prompt programming and in-context learning with large language models (LLMs) to build classifiers for hallucination detection, and extends that work through the incorporation of context-specific definition of task, role, and target concept, and automated generation of examples for use in a few-shot prompting approach. The resulting system achieved fourth-best and sixth-best performance in the model-agnostic track and model-aware tracks for Task 6, respectively, and evaluation using the validation sets showed that the system's classification decisions were consistent with those of the crowd-sourced human labellers. We further found that a zero-shot approach provided better accuracy than a few-shot approach using automatically generated examples. Code for the system described in this paper is available on Github.

View on arXiv PDF Code

Similar