CLMay 9, 2024

Detecting Statements in Text: A Domain-Agnostic Few-Shot Solution

arXiv:2405.05705v11 citations
Originality Incremental advance
AI Analysis

This provides a domain-agnostic solution for tasks in Computational Social Science and Web Content Analysis, though it is incremental as it builds on existing few-shot and NLI techniques.

The paper tackles the problem of classifying text based on claims without large annotated datasets by proposing a few-shot learning methodology using Natural Language Inference models and dynamic sampling, which rivals traditional fine-tuning approaches while reducing annotation needs.

Many tasks related to Computational Social Science and Web Content Analysis involve classifying pieces of text based on the claims they contain. State-of-the-art approaches usually involve fine-tuning models on large annotated datasets, which are costly to produce. In light of this, we propose and release a qualitative and versatile few-shot learning methodology as a common paradigm for any claim-based textual classification task. This methodology involves defining the classes as arbitrarily sophisticated taxonomies of claims, and using Natural Language Inference models to obtain the textual entailment between these and a corpus of interest. The performance of these models is then boosted by annotating a minimal sample of data points, dynamically sampled using the well-established statistical heuristic of Probabilistic Bisection. We illustrate this methodology in the context of three tasks: climate change contrarianism detection, topic/stance classification and depression-relates symptoms detection. This approach rivals traditional pre-train/fine-tune approaches while drastically reducing the need for data annotation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes