AICLFeb 26, 2025

ZEBRA: Leveraging Model-Behavioral Knowledge for Zero-Annotation Preference Dataset Construction

arXiv:2502.18744v31 citationsh-index: 4EMNLP
Originality Incremental advance
AI Analysis

This addresses the cost and scalability issues in preference dataset construction for LLM alignment, though it is incremental as it builds on existing alignment methods.

The paper tackles the problem of high annotation costs in LLM alignment by proposing ZEBRA, a zero-annotation framework that constructs preference datasets using model behavior knowledge from benchmarks, achieving alignment performance comparable to supervised methods without manual labeling.

Recent efforts in LLM alignment have focused on constructing large-scale preference datasets via human or Artificial Intelligence (AI) annotators. However, such approaches rely on instance-wise supervision, incurring substantial annotation cost and limited interpretability. In this paper, we propose ZEBRA - a model behavior-wise zero-annotation framework that constructs preference data by leveraging model behavior knowledge derived from benchmark performances. ZEBRA binarizes response pairs by evaluating the quality and similarity of their origin models, entirely bypassing instance-level annotation. This allows scalable, controllable, and cost-effective alignment data generation. Empirical results show that ZEBRA achieves alignment performance comparable to instance-supervised methods, despite requiring no manual or model-based labeling.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes