CLMay 29, 2025

Dataset Cartography for Large Language Model Alignment: Mapping and Diagnosing Preference Data

Seohyeong Lee, Eunwon Kim, Hwaran Lee, Buru Chang

arXiv:2505.23114v26.72 citationsh-index: 3Has Code

Originality Incremental advance

AI Analysis

This addresses the scalability challenge in LLM alignment by improving data collection efficiency for researchers and practitioners, though it is incremental as it builds on existing preference datasets and tools like GPT-4o.

The paper tackles the inefficiency of collecting human preference data for aligning large language models by introducing Alignment Data Map, a GPT-4o-assisted tool that analyzes preference data to identify high-quality samples. The result shows that using only 33% of the data from high-mean, low-variance regions achieves performance comparable to or better than using the entire dataset.

Human preference data plays a critical role in aligning large language models (LLMs) with human values. However, collecting such data is often expensive and inefficient, posing a significant scalability challenge. To address this, we introduce Alignment Data Map, a GPT-4o-assisted tool for analyzing and diagnosing preference data. Using GPT-4o as a proxy for LLM alignment, we compute alignment scores for LLM-generated responses to instructions from existing preference datasets. These scores are then used to construct an Alignment Data Map based on their mean and variance. Our experiments show that using only 33 percent of the data, specifically samples in the high-mean, low-variance region, achieves performance comparable to or better than using the entire dataset. This finding suggests that the Alignment Data Map can significantly improve data collection efficiency by identifying high-quality samples for LLM alignment without requiring explicit annotations. Moreover, the Alignment Data Map can diagnose existing preference datasets. Our analysis shows that it effectively detects low-impact or potentially misannotated samples. Source code is available online.

View on arXiv PDF

Similar