DecipherPref: Analyzing Influential Factors in Human Preference Judgments via GPT-4
This work addresses the problem of understanding human preferences for researchers and developers in AI, providing insights to improve dataset construction for aligning LLMs with human values, though it is incremental as it builds on existing data and models.
The paper analyzed human preference judgments from OpenAI to identify influential factors like output length and factual consistency, finding that least favored factors such as brevity and hallucinations are consistent across tasks, while most favored factors vary.
Human preference judgments are pivotal in guiding large language models (LLMs) to produce outputs that align with human values. Human evaluations are also used in summarization tasks to compare outputs from various systems, complementing existing automatic metrics. Despite their significance, however, there has been limited research probing these pairwise or $k$-wise comparisons. The collective impact and relative importance of factors such as output length, informativeness, fluency, and factual consistency are still not well understood. It is also unclear if there are other hidden factors influencing human judgments. In this paper, we conduct an in-depth examination of a collection of pairwise human judgments released by OpenAI. Utilizing the Bradley-Terry-Luce (BTL) model, we reveal the inherent preferences embedded in these human judgments. We find that the most favored factors vary across tasks and genres, whereas the least favored factors tend to be consistent, e.g., outputs are too brief, contain excessive off-focus content or hallucinated facts. Our findings have implications on the construction of balanced datasets in human preference evaluations, which is a crucial step in shaping the behaviors of future LLMs.