CLAIAug 18, 2022

Active PETs: Active Data Annotation Prioritisation for Few-Shot Claim Verification with Pattern Exploiting Training

arXiv:2208.08749v2269 citationsh-index: 43Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge of limited labelled data for fact-checking systems, offering an incremental improvement in data annotation prioritisation for few-shot learning.

The paper tackles the problem of few-shot claim verification by proposing Active PETs, a method that actively selects unlabelled data for annotation using an ensemble of Pattern Exploiting Training models, resulting in consistent performance improvements over baselines on two datasets with six language models.

To mitigate the impact of the scarcity of labelled data on fact-checking systems, we focus on few-shot claim verification. Despite recent work on few-shot classification by proposing advanced language models, there is a dearth of research in data annotation prioritisation that improves the selection of the few shots to be labelled for optimal model performance. We propose Active PETs, a novel weighted approach that utilises an ensemble of Pattern Exploiting Training (PET) models based on various language models, to actively select unlabelled data as candidates for annotation. Using Active PETs for few-shot data selection shows consistent improvement over the baseline methods, on two technical fact-checking datasets and using six different pretrained language models. We show further improvement with Active PETs-o, which further integrates an oversampling strategy. Our approach enables effective selection of instances to be labelled where unlabelled data is abundant but resources for labelling are limited, leading to consistently improved few-shot claim verification performance. Our code is available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes