CLFeb 17, 2023

Like a Good Nearest Neighbor: Practical Content Moderation and Text Classification

arXiv:2302.08957v3104 citationsh-index: 5
Originality Incremental advance
AI Analysis

This work addresses domain drift in text classification, particularly for content moderation on social media platforms, but is incremental as it builds on an existing method.

The paper tackles the problem of deploying few-shot text classification systems by proposing LaGoNN, a modification to SetFit that improves performance without adding learnable parameters, achieving better results in content moderation and general classification tasks.

Few-shot text classification systems have impressive capabilities but are infeasible to deploy and use reliably due to their dependence on prompting and billion-parameter language models. SetFit (Tunstall et al., 2022) is a recent, practical approach that fine-tunes a Sentence Transformer under a contrastive learning paradigm and achieves similar results to more unwieldy systems. Inexpensive text classification is important for addressing the problem of domain drift in all classification tasks, and especially in detecting harmful content, which plagues social media platforms. Here, we propose Like a Good Nearest Neighbor (LaGoNN), a modification to SetFit that introduces no learnable parameters but alters input text with information from its nearest neighbor, for example, the label and text, in the training data, making novel data appear similar to an instance on which the model was optimized. LaGoNN is effective at flagging undesirable content and text classification, and improves the performance of SetFit. To demonstrate the value of LaGoNN, we conduct a thorough study of text classification systems in the context of content moderation under four label distributions, and in general and multilingual classification settings.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes