CLAIApr 4, 2024

SHROOM-INDElab at SemEval-2024 Task 6: Zero- and Few-Shot LLM-Based Classification for Hallucination Detection

arXiv:2404.03732v126 citationsh-index: 7SemEval
Originality Synthesis-oriented
AI Analysis

This work addresses hallucination detection for NLP researchers and practitioners, but is incremental as it builds on existing prompt-based methods.

The authors tackled hallucination detection in language models by developing a system using prompt programming and in-context learning with LLMs, achieving fourth-best and sixth-best performance in model-agnostic and model-aware tracks respectively, and found that zero-shot approaches outperformed few-shot ones.

We describe the University of Amsterdam Intelligent Data Engineering Lab team's entry for the SemEval-2024 Task 6 competition. The SHROOM-INDElab system builds on previous work on using prompt programming and in-context learning with large language models (LLMs) to build classifiers for hallucination detection, and extends that work through the incorporation of context-specific definition of task, role, and target concept, and automated generation of examples for use in a few-shot prompting approach. The resulting system achieved fourth-best and sixth-best performance in the model-agnostic track and model-aware tracks for Task 6, respectively, and evaluation using the validation sets showed that the system's classification decisions were consistent with those of the crowd-sourced human labellers. We further found that a zero-shot approach provided better accuracy than a few-shot approach using automatically generated examples. Code for the system described in this paper is available on Github.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes