CLAILGOct 17, 2023

Eliciting Human Preferences with Language Models

Meta AIMITStanford
arXiv:2310.11589v196 citationsh-index: 25Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of aligning language models to complex human preferences, which is incremental as it builds on existing elicitation methods.

The paper tackles the challenge of specifying tasks for language models by proposing Generative Active Task Elicitation (GATE), a framework where models interact with users to elicit preferences, and shows that this method often yields more informative responses than user-written prompts or labels.

Language models (LMs) can be directed to perform target tasks by using labeled examples or natural language prompts. But selecting examples or writing prompts for can be challenging--especially in tasks that involve unusual edge cases, demand precise articulation of nebulous preferences, or require an accurate mental model of LM behavior. We propose to use *LMs themselves* to guide the task specification process. In this paper, we introduce **Generative Active Task Elicitation (GATE)**: a learning framework in which models elicit and infer intended behavior through free-form, language-based interaction with users. We study GATE in three domains: email validation, content recommendation, and moral reasoning. In preregistered experiments, we show that LMs prompted to perform GATE (e.g., by generating open-ended questions or synthesizing informative edge cases) elicit responses that are often more informative than user-written prompts or labels. Users report that interactive task elicitation requires less effort than prompting or example labeling and surfaces novel considerations not initially anticipated by users. Our findings suggest that LM-driven elicitation can be a powerful tool for aligning models to complex human preferences and values.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes