CLAIOct 14, 2022

Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values

arXiv:2210.07652v1223 citationsh-index: 24
Originality Incremental advance
AI Analysis

This addresses the need for more inclusive and explainable AI in value-sensitive tasks like toxicity detection, though it is incremental as it builds on existing LLM and fine-tuning methods.

The paper tackles the problem of making NLP classifiers align with diverse human values by introducing a framework that uses explicitly written human values in commands, achieving at least 15.56% F1-score improvement over baselines.

Many NLP classification tasks, such as sexism/racism detection or toxicity detection, are based on human values. Yet, human values can vary under diverse cultural conditions. Therefore, we introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command. Along with the task, we propose a practical approach that distills value-aligned knowledge from large-scale language models (LLMs) to construct value-aligned classifiers in two steps. First, we generate value-aligned training data from LLMs by prompt-based few-shot learning. Next, we fine-tune smaller classification models with the generated data for the task. Empirical results show that our VA-Models surpass multiple baselines by at least 15.56% on the F1-score, including few-shot learning with OPT-175B and existing text augmentation methods. We suggest that using classifiers with explicit human value input improves both inclusivity & explainability in AI.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes